TechTalks from event: Sixth Annual Machine Learning Symposium
This event was organized and sponsored by The New York Academy of Sciences.
Stochastic Algorithms for One-Pass Learning
The goal of the presentation is to describe practical stochastic gradient algorithms that process each training
example only once, yet asymptotically match the performance of the true optimum. This statement needs, of
course, to be made more precise. To achieve this, we'll review the works of Nevel'son and Has'minskij (1972),
Fabian (1973, 1978), Murata & Amari (1998), Bottou & LeCun (2004), Polyak & Juditsky (1992), Wei Xu (2010),
and Bach & Moulines (2011). We will then show how these ideas lead to practical algorithms that not only
represent a new state of the art but are also arguably optimal.
Online Learning without a Learning Rate Parameter
Online learning is an approach to statistical inference based on the idea of playing a repeated game. A "master"
algorithm recieves the prediction of N experts before making its own prediction. Then the outcome is revealed,
and experts and master suffer a loss.
Algorithms have been developed for which the regret, the difference between the cumulative loss of the
master and the cumulative loss of the best expert, is bounded uniformly over all sequences of expert
predictions and outcome.
The most successful algorithms of this type are the exponential weights algorithms discovered by Littlestone
and Warmuth and refined by many others. The exponential weights algorithm has a parameter, the learning
rate, which has to be tuned appropriately to achieve the best bounds. This tuning typically depends on the
number of experts and on the cumulative loss of the best expert. We describe a new algorithm - NormalHedge,
which has no parameter and achieves comparable bounds to tuned exponential weights algorithms.
As the algorithm does not depend on the number of experts it can be used effectively when the set of experts
grows as a function of time and when the set of experts is uncountably infinite.
In addition, the algorithm has a natural extension for continuous time and has a very tight analysis when the
cumulative loss is described by an Ito process.
This is joint work with Kamalika Chaudhuri and Daniel Hsu.