Online learning is an approach to statistical inference based on the idea of playing a repeated game. A "master" algorithm recieves the prediction of N experts before making its own prediction. Then the outcome is revealed, and experts and master suffer a loss.

Algorithms have been developed for which the regret, the difference between the cumulative loss of the master and the cumulative loss of the best expert, is bounded uniformly over all sequences of expert predictions and outcome.

The most successful algorithms of this type are the exponential weights algorithms discovered by Littlestone and Warmuth and refined by many others. The exponential weights algorithm has a parameter, the learning rate, which has to be tuned appropriately to achieve the best bounds. This tuning typically depends on the number of experts and on the cumulative loss of the best expert. We describe a new algorithm - NormalHedge, which has no parameter and achieves comparable bounds to tuned exponential weights algorithms.

As the algorithm does not depend on the number of experts it can be used effectively when the set of experts grows as a function of time and when the set of experts is uncountably infinite.

In addition, the algorithm has a natural extension for continuous time and has a very tight analysis when the cumulative loss is described by an Ito process.

This is joint work with Kamalika Chaudhuri and Daniel Hsu.

Questions and Answers

You need to be logged in to be able to post here.