ICML 2011
TechTalks from event: ICML 2011
Robotics and Reinforcement Learning
-
Conjugate Markov Decision ProcessesMany open problems involve the search for a mapping that is used by an algorithm solving an MDP. Useful mappings are often from the state set to some other set. Examples include representation discovery (a mapping to a feature space) and skill discovery (a mapping to skill termination probabilities). Different mappings result in algorithms achieving varying expected returns. In this paper we present a novel approach to the search for any mapping used by any algorithm attempting to solve an MDP, for that which results in maximum expected return.
-
Approximate Dynamic Programming for Storage ProblemsStorage problems are an important subclass of stochastic control problems. This paper presents a new method, approximate dynamic programming for storage, to solve storage problems with continuous, convex decision sets. Unlike other solution procedures, ADPS allows math programming to be used to make decisions each time period, even in the presence of large state variables. We test ADPS on the day ahead wind commitment problem with storage.
-
Apprenticeship Learning About Multiple IntentionsIn this paper, we apply tools from inverse reinforcement learning (IRL) to the problem of learning from (unlabeled) demonstration trajectories of behavior generated by varying ``intentions'' or objectives. We derive an EM approach that clusters observed trajectories by inferring the objectives for each cluster using any of several possible IRL methods, and then uses the constructed clusters to quickly identify the intent of a trajectory. We show that a natural approach to IRL---a gradient ascent method that modifies reward parameters to maximize the likelihood of the observed trajectories---is successful at quickly identifying unknown reward functions. We demonstrate these ideas in the context of apprenticeship learning by acquiring the preferences of a human driver in a simple highway car simulator.
-
Classification-based Policy Iteration with a CriticIn this paper, we study the effect of adding a value function approximation component (critic) to rollout classification-based policy iteration (RCPI) algorithms. The idea is to use a critic to approximate the return after we truncate the rollout trajectories. This allows us to control the bias and variance of the rollout estimates of the action-value function. Therefore, the introduction of a critic can improve the accuracy of the rollout estimates, and as a result, enhance the performance of the RCPI algorithm. We present a new RCPI algorithm, called direct policy iteration with critic (DPI-Critic), and provide its finite-sample analysis when the critic is based on the LSTD method. We empirically evaluate the performance of DPI-Critic and compare it with DPI and LSPI in two benchmark reinforcement learning problems.
- All Sessions
- Keynotes
- Bandits and Online Learning
- Structured Output
- Reinforcement Learning
- Graphical Models and Optimization
- Recommendation and Matrix Factorization
- Neural Networks and Statistical Methods
- Latent-Variable Models
- Large-Scale Learning
- Learning Theory
- Feature Selection, Dimensionality Reduction
- Invited Cross-Conference Track
- Neural Networks and Deep Learning
- Latent-Variable Models
- Active and Online Learning
- Tutorial : Collective Intelligence and Machine Learning
- Tutorial: Machine Learning in Ecological Science and Environmental Policy
- Tutorial: Machine Learning and Robotics
- Ensemble Methods
- Tutorial: Introduction to Bandits: Algorithms and Theory
- Tutorial: Machine Learning for Large Scale Recommender Systems
- Tutorial: Learning Kernels
- Test-of-Time
- Best Paper
- Robotics and Reinforcement Learning
- Transfer Learning
- Kernel Methods
- Optimization
- Learning Theory
- Invited Cross-Conference Session
- Neural Networks and Deep Learning
- Reinforcement Learning
- Bayesian Inference and Probabilistic Models
- Supervised Learning
- Social Networks
- Evaluation Metrics
- statistical relational learning
- Outlier Detection
- Time Series
- Graphical Models and Bayesian Inference
- Sparsity and Compressed Sensing
- Clustering
- Game Theory and Planning and Control
- Semi-Supervised Learning
- Kernel Methods and Optimization
- Neural Networks and NLP
- Probabilistic Models & MCMC
- Online Learning
- Ranking and Information Retrieval