ICML 2011
TechTalks from event: ICML 2011
Game Theory and Planning and Control

Integrating Partial Model Knowledge in Model Free RL AlgorithmsIn reinforcement learning an agent uses online feedback from the environment and prior knowledge in order to adaptively select an effective policy. Model free approaches address this task by directly mapping external and internal states to actions, while model based methods attempt to construct a model of the environment, followed by a selection of optimal actions based on that model. Given the complementary advantages of both approaches, we suggest a novel algorithm which combines them into a single algorithm, which switches between a model based and a model free mode, depending on the current environmental state and on the status of the agent's knowledge. We prove that such an approach leads to improved performance whenever environmental knowledge is available, without compromising performance when such knowledge is absent. Numerical simulations demonstrate the effectiveness of the approach and suggest its efficacy in boosting policy gradient learning.

Task Space Retrieval Using Inverse Feedback ControlLearning complex skills by repeating and generalizing expert behavior is a fundamental problem in robotics. A common approach is learning from demonstration: given examples of correct motions, learn a policy mapping state to action consistent with the training data. However, the usual approaches do not answer the question of what are appropriate representations to generate motions for specific tasks. Inspired by Inverse Optimal Control, we present a novel method to learn latent costs, imitate and generalize demonstrated behavior, and discover a task relevant motion representation: Task Space Retrieval Using Inverse Feedback Control (TRIC). We use the learned latent costs to create motion with a feedback controller. We tested our method on robot grasping of objects, a challenging highdimensional task. TRIC learns the important control dimensions for the grasping task from a few example movements and is able to robustly approach and grasp objects in new situations.

PILCO: A ModelBased and DataEfficient Approach to Policy SearchIn this paper, we introduce PILCO, a practical, dataefficient modelbased policy search method. PILCO reduces model bias, one of the key problems of modelbased reinforcement learning, in a principled way. By learning a probabilistic dynamics model and explicitly incorporating model uncertainty into longterm planning, PILCO can cope with very little data and facilitates learning from scratch in only a few trials. Policy evaluation is performed in closed form using stateoftheart approximate inference. Furthermore, policy gradients are computed analytically for policy improvement. We report unprecedented learning efficiency on challenging and highdimensional control tasks.

Approximating Correlated Equilibria using Relaxations on the Marginal PolytopeIn game theory, a Correlated Equilibrium (CE) is an equilibrium concept that generalizes the more wellknown Nash Equilibrium. If the game is represented as a graphical game, the computational complexity of computing an optimum CE is exponential in the treewidth of the graph. In settings where this exact computation is not feasible, it is desirable to approximate the properties of the CE, such as its expected social utility and marginal probabilities. We study outer relaxations of this problem that yield approximate marginal strategies for the players under a variety of utility functions. Results on simulated games and in a real problem involving drug design indicate that our approximations can be highly accurate and can be successfully used when exact computation of CE is infeasible.

Generalized Value Functions for Large Action SetsThe majority of value function approximation based reinforcement learning algorithms available today, focus on approximating the state (V) or stateaction (Q) value function and efficient action selection comes as an afterthought. On the other hand, realworld problems tend to have large action spaces, where evaluating every possible action becomes impractical. This mismatch presents a major obstacle in successfully applying reinforcement learning to realworld problems. In this paper we present a unified view of V and Q functions and arrive at a new spaceefficient representation, where action selection can be done exponentially faster, without the use of a model. We then describe how to calculate this new value function efficiently via approximate linear programming and provide experimental results that demonstrate the effectiveness of the proposed approach.
 All Sessions
 Keynotes
 Bandits and Online Learning
 Structured Output
 Reinforcement Learning
 Graphical Models and Optimization
 Recommendation and Matrix Factorization
 Neural Networks and Statistical Methods
 LatentVariable Models
 LargeScale Learning
 Learning Theory
 Feature Selection, Dimensionality Reduction
 Invited CrossConference Track
 LatentVariable Models
 Active and Online Learning
 Neural Networks and Deep Learning
 Tutorial: Machine Learning in Ecological Science and Environmental Policy
 Tutorial: Machine Learning and Robotics
 Ensemble Methods
 Tutorial: Introduction to Bandits: Algorithms and Theory
 Tutorial: Machine Learning for Large Scale Recommender Systems
 Tutorial: Learning Kernels
 Tutorial : Collective Intelligence and Machine Learning
 TestofTime
 Best Paper
 Robotics and Reinforcement Learning
 Transfer Learning
 Kernel Methods
 Optimization
 Learning Theory
 Invited CrossConference Session
 Neural Networks and Deep Learning
 Reinforcement Learning
 Bayesian Inference and Probabilistic Models
 Supervised Learning
 Social Networks
 Evaluation Metrics
 statistical relational learning
 Outlier Detection
 Time Series
 Graphical Models and Bayesian Inference
 Sparsity and Compressed Sensing
 Clustering
 Game Theory and Planning and Control
 SemiSupervised Learning
 Kernel Methods and Optimization
 Neural Networks and NLP
 Probabilistic Models & MCMC
 Online Learning
 Ranking and Information Retrieval