TechTalks from event: Big Learning: Algorithms, Systems, and Tools for Learning at Scale

We are still uploading slides and videos for this event. Please excuse any discrepancy.

Day 2 Morning Session

• Hazy: Making Data-driven Statistical Applications Easier to build and Maintain Authors: Chris Re
The main question driving my group's research is: how does one deploy statistical data-analysis tools to enhance data driven systems? Our goal is to find abstractions that one needs to deploy and maintain such systems. In this talk, I describe my group's attack on this question by building a diverse set of statistical-based data-driven applications: a system whose goal is to read the Web and answer complex questions, a muon detector in collaboration with a neutrino telescope called IceCube, and a social-science applications involving rich content (OCR and speech data). Even in this diverse set, my group has found common abstractions that we are exploiting to build and to maintain systems. Of particular relevance to this workshop is that I have heard of applications in each of these domains referred to as â€œbig data.â€ Nevertheless, in our experience in each of these tasks, after appropriate preprocessing, the relevant data can be stored in a few terabytes -- small enough to fit entirely in RAM or on a handful of disks. As a result, it is unclear to me that scale is the most pressing concern for academics. I argue that dealing with data at TB scale is still challenging, useful, and fun, and I will describe some of our work in this direction. This is joint work with Benjamin Recht, Stephen J. Wright, and the Hazy Team
• Poster Spotlights Authors: Poster presenters
• The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo Authors: Matt Hoffman
Adaptively Setting Path Lengths in Hamiltonian Monte Carlo Hamiltonian Monte Carlo (HMC) is a Markov Chain Monte Carlo (MCMC) algorithm that avoids the random walk behavior and sensitivity to correlations that plague many MCMC methods by taking a series of steps informed by first-order gradient information. These features allow it to converge to high-dimensional target distributions much more quickly than popular methods such as random walk Metropolis or Gibbs sampling. However, HMC's performance is highly sensitive to two user-specified parameters: a step size $\epsilon$ and a desired number of steps $L$. In particular, if $L$ is too small then the algorithm exhibits undesirable random walk behavior, while if $L$ is too large the algorithm wastes computation. We present the No-U-Turn Sampler (NUTS), an extension to HMC that eliminates the need to set a number of steps $L$. NUTS uses a recursive algorithm to build a set of likely candidate points that spans a wide swath of the target distribution, stopping automatically when it starts to double back and retrace its steps. NUTS is able to achieve similar performance to a well tuned standard HMC method, without requiring user intervention or costly tuning runs. NUTS can thus be used in applications such as BUGS-style automatic inference engines that require efficient "turnkey'' sampling algorithms.
• Real time data sketches Authors: Alex Smola
I will describe a set of algorithms for extending streaming and sketching algorithms to real time analytics. These algorithm captures frequency information for streams of arbitrary sequences of symbols. The algorithm uses the Count-Min sketch as its basis and exploits the fact that the sketching operation is linear. It provides real time statistics of arbitrary events, e.g.\ streams of queries as a function of time. In particular, we use a factorizing approximation to provide point estimates at arbitrary (time, item) combinations. The service runs in real time, it scales perfectly in terms of throughput and accuracy, using distributed hashing. The latter also provides performance guarantees in the case of machine failure. Queries can be answered in constant time regardless of the amount of data to be processed. The same distribution techniques can also be used for heavy hitter detection in a distributed scalable fashion.
• Randomized Smoothing for (Parallel) Stochastic Optimization Authors: John Duchi
By combining randomized smoothing techniques with accelerated gradient methods, we obtain convergence rates for stochastic optimization procedures, both in expectation and with high probability, that have optimal dependence on the variance of the gradient estimates. To the best of our knowledge, these are the first variance-based rates for non-smooth optimization. A combination of our techniques with recent work on decentralized optimization yields order-optimal parallel stochastic optimization algorithms. We give applications of our results to statistical machine learning problems, providing experimental results demonstrating the effectiveness of our algorithms.