ICML 2011
TechTalks from event: ICML 2011
Bandits and Online Learning
-
Unimodal BanditsWe consider multiarmed bandit problems where the expected reward is unimodal over partially ordered arms. In particular, the arms may belong to a continuous interval or correspond to vertices in a graph, where the graph structure represents similarity in rewards. The unimodality assumption has an important advantage: we can determine if a given arm is optimal by sampling the possible directions around it. This property allows us to quickly and efficiently find the optimal arm and detect abrupt changes in the reward distributions. For the case of bandits on graphs, we incur a regret proportional to the maximal degree and the diameter of the graph, instead of the total number of vertices.
-
On tracking portfolios with certainty equivalents on a generalization of Markowitz model: the Fool, the Wise and the AdaptivePortfolio allocation theory has been heavily influenced by a major contribution of Harry Markowitz in the early fifties: the mean-variance approach. While there has been a continuous line of works in on-line learning portfolios over the past decades, very few works have really tried to cope with Markowitz model. A major drawback of the mean-variance approach is that it is approximation-free only when stock returns obey a Gaussian distribution, an assumption known not to hold in real data. In this paper, we first alleviate this assumption, and rigorously lift the mean-variance model to a more general mean-divergence model in which stock returns are allowed to obey any exponential family of distributions. We then devise a general on-line learning algorithm in this setting. We prove for this algorithm the first lower bounds on the most relevant quantity to be optimized in the framework of Markowitz model: the certainty equivalents. Experiments on four real-world stock markets display its ability to track portfolios whose cumulated returns exceed those of the best stock by orders of magnitude.
-
Beat the Mean BanditThe Dueling Bandits Problem is an online learning framework in which actions are restricted to noisy comparisons between pairs of strategies (also known as bandits). It models settings where absolute rewards are difficult to elicit but pairwise preferences are readily available. In this paper, we extend the Dueling Bandits Problem to a relaxed setting where preference magnitudes can violate transitivity. We present the first algorithm for this more general Dueling Bandits Problem and provide theoretical guarantees in both the online and the PAC settings. Furthermore, we show that the new algorithm has stronger guarantees than existing results even in the original Dueling Bandits Problem, which we validate empirically.
-
Multiclass Classification with Bandit Feedback using Adaptive RegularizationWe present a new multiclass algorithm in the bandit framework, where after making a prediction, the learning algorithm receives only partial feedback, i.e., a single bit of right-or-wrong, rather then the true label. Our algorithm is based on the second-order Perceptron, and uses upper-confidence bounds to trade off exploration and exploitation. We analyze this algorithm in a partial adversarial setting, where instances are chosen adversarially, while the labels are chosen according to a linear probabilistic model, which is also chosen adversarially. From the theoretical viewpoint, we show a regret of O(sqrt{T}log(T)), which improves over the current best bounds of O(T^{2/3}) in the fully adversarial setting. We evaluate our algorithm on nine real-world text classification problems, obtaining state-of-the-art results, even compared with non-bandit online algorithms, especially when label noise is introduced.
- All Sessions
- Keynotes
- Bandits and Online Learning
- Structured Output
- Reinforcement Learning
- Graphical Models and Optimization
- Recommendation and Matrix Factorization
- Neural Networks and Statistical Methods
- Latent-Variable Models
- Large-Scale Learning
- Learning Theory
- Feature Selection, Dimensionality Reduction
- Invited Cross-Conference Track
- Neural Networks and Deep Learning
- Latent-Variable Models
- Active and Online Learning
- Tutorial : Collective Intelligence and Machine Learning
- Tutorial: Machine Learning in Ecological Science and Environmental Policy
- Tutorial: Machine Learning and Robotics
- Ensemble Methods
- Tutorial: Introduction to Bandits: Algorithms and Theory
- Tutorial: Machine Learning for Large Scale Recommender Systems
- Tutorial: Learning Kernels
- Test-of-Time
- Best Paper
- Robotics and Reinforcement Learning
- Transfer Learning
- Kernel Methods
- Optimization
- Learning Theory
- Invited Cross-Conference Session
- Neural Networks and Deep Learning
- Reinforcement Learning
- Bayesian Inference and Probabilistic Models
- Supervised Learning
- Social Networks
- Evaluation Metrics
- statistical relational learning
- Outlier Detection
- Time Series
- Graphical Models and Bayesian Inference
- Sparsity and Compressed Sensing
- Clustering
- Game Theory and Planning and Control
- Semi-Supervised Learning
- Kernel Methods and Optimization
- Neural Networks and NLP
- Probabilistic Models & MCMC
- Online Learning
- Ranking and Information Retrieval