ICML 2011
TechTalks from event: ICML 2011
Game Theory and Planning and Control

Integrating Partial Model Knowledge in Model Free RL AlgorithmsIn reinforcement learning an agent uses online feedback from the environment and prior knowledge in order to adaptively select an effective policy. Model free approaches address this task by directly mapping external and internal states to actions, while model based methods attempt to construct a model of the environment, followed by a selection of optimal actions based on that model. Given the complementary advantages of both approaches, we suggest a novel algorithm which combines them into a single algorithm, which switches between a model based and a model free mode, depending on the current environmental state and on the status of the agent's knowledge. We prove that such an approach leads to improved performance whenever environmental knowledge is available, without compromising performance when such knowledge is absent. Numerical simulations demonstrate the effectiveness of the approach and suggest its efficacy in boosting policy gradient learning.

Task Space Retrieval Using Inverse Feedback ControlLearning complex skills by repeating and generalizing expert behavior is a fundamental problem in robotics. A common approach is learning from demonstration: given examples of correct motions, learn a policy mapping state to action consistent with the training data. However, the usual approaches do not answer the question of what are appropriate representations to generate motions for specific tasks. Inspired by Inverse Optimal Control, we present a novel method to learn latent costs, imitate and generalize demonstrated behavior, and discover a task relevant motion representation: Task Space Retrieval Using Inverse Feedback Control (TRIC). We use the learned latent costs to create motion with a feedback controller. We tested our method on robot grasping of objects, a challenging highdimensional task. TRIC learns the important control dimensions for the grasping task from a few example movements and is able to robustly approach and grasp objects in new situations.

PILCO: A ModelBased and DataEfficient Approach to Policy SearchIn this paper, we introduce PILCO, a practical, dataefficient modelbased policy search method. PILCO reduces model bias, one of the key problems of modelbased reinforcement learning, in a principled way. By learning a probabilistic dynamics model and explicitly incorporating model uncertainty into longterm planning, PILCO can cope with very little data and facilitates learning from scratch in only a few trials. Policy evaluation is performed in closed form using stateoftheart approximate inference. Furthermore, policy gradients are computed analytically for policy improvement. We report unprecedented learning efficiency on challenging and highdimensional control tasks.

Approximating Correlated Equilibria using Relaxations on the Marginal PolytopeIn game theory, a Correlated Equilibrium (CE) is an equilibrium concept that generalizes the more wellknown Nash Equilibrium. If the game is represented as a graphical game, the computational complexity of computing an optimum CE is exponential in the treewidth of the graph. In settings where this exact computation is not feasible, it is desirable to approximate the properties of the CE, such as its expected social utility and marginal probabilities. We study outer relaxations of this problem that yield approximate marginal strategies for the players under a variety of utility functions. Results on simulated games and in a real problem involving drug design indicate that our approximations can be highly accurate and can be successfully used when exact computation of CE is infeasible.

Generalized Value Functions for Large Action SetsThe majority of value function approximation based reinforcement learning algorithms available today, focus on approximating the state (V) or stateaction (Q) value function and efficient action selection comes as an afterthought. On the other hand, realworld problems tend to have large action spaces, where evaluating every possible action becomes impractical. This mismatch presents a major obstacle in successfully applying reinforcement learning to realworld problems. In this paper we present a unified view of V and Q functions and arrive at a new spaceefficient representation, where action selection can be done exponentially faster, without the use of a model. We then describe how to calculate this new value function efficiently via approximate linear programming and provide experimental results that demonstrate the effectiveness of the proposed approach.
SemiSupervised Learning

Vectorvalued Manifold RegularizationWe consider the general problem of learning an unknown functional dependency, f : X>Y, between a structured input space X and a structured output space Y, from labeled and unlabeled examples. We formulate this problem in terms of datadependent regularization in Vectorvalued Reproducing Kernel Hilbert Spaces (Micchelli & Pontil, 2005) which elegantly extend familiar scalarvalued kernel methods to the general setting where Y has a Hilbert space structure. Our methods provide a natural extension of Manifold Regularization (Belkin et al., 2006) algorithms to also exploit output interdependencies while enforcing smoothness with respect to input data geometry. We propose a class of matrixvalued kernels which allow efficient implementations of our algorithms via the use of numerical solvers for Sylvester matrix equations. On multilabel image annotation and text classification problems, we find favorable empirical comparisons against several competing alternatives.

Semisupervised Penalized Output Kernel Regression for Link PredictionLink prediction is addressed as an output kernel learning task through semisupervised Output Kernel Regression. Working in the framework of RKHS theory with vectorvalued functions, we establish a new representer theorem devoted to semisupervised least square regression. We then apply it to get a new model (POKR: Penalized Output Kernel Regression) and show its relevance using numerical experiments on artificial networks and two real applications using a very low percentage of labeled data in a transductive setting.

Access to Unlabeled Data can Speed up Prediction TimeSemisupervised learning (SSL) addresses the problem of training a classifier using a small number of labeled examples and many unlabeled examples. Most previous work on SSL focused on how availability of unlabeled data can improve the accuracy of the learned classifiers. In this work we study how unlabeled data can be beneficial for constructing faster classifiers. We propose an SSL algorithmic framework which can utilize unlabeled examples for learning classifiers from a predefined set of fast classifiers. We formally analyze conditions under which our algorithmic paradigm obtains significant improvements by the use of unlabeled data. As a side benefit of our analysis we propose a novel quantitative measure of the socalled cluster assumption. We demonstrate the potential merits of our approach by conducting experiments on the MNIST data set, showing that, when a sufficiently large unlabeled sample is available, a fast classifier can be learned from much fewer labeled examples than without such a sample.

Automatic Feature Decomposition for Single View CotrainingOne of the most successful semisupervised learning approaches is cotraining for multiview data. In cotraining, one trains two classifiers, one for each view, and uses the most confident predictions of the unlabeled data for the two classifiers to ``teach each other''. In this paper, we extend cotraining to learning scenarios without an explicit multiview representation. Inspired by a theoretical analysis of Balcan et. al (2004), we introduce a novel algorithm that splits the feature space during learning, explicitly to encourage cotraining to be successful. We demonstrate the efficacy of our proposed method in a weaklysupervised setting on the challenging Caltech256 object recognition task, where we improve significantly over previous results by (Bergamo & Torresani, 2010) in almost all trainingset size settings.

Towards Making Unlabeled Data Never HurtIt is usually expected that, when labeled data are limited, the learning performance can be improved by exploiting unlabeled data. In many cases, however, the performances of current semisupervised learning approaches may be even worse than purely using the limited labeled data.It is desired to have extit{safe} semisupervised learning approaches which never degenerate learning performance by using unlabeled data. In this paper, we focus on semisupervised support vector machines (S3VMs) and propose S4VMs, i.e., safe S3VMs. Unlike S3VMs which typically aim at approaching an optimal lowdensity separator, S4VMs try to exploit the candidate lowdensity separators simultaneously to reduce the risk of identifying a poor separator with unlabeled data. We describe two implementations of S4VMs, and our comprehensive experiments show that the overall performance of S4VMs are highly competitive to S3VMs, while in contrast to S3VMs which degenerate performance in many cases, S4VMs are never significantly inferior to inductive SVMs.
Kernel Methods and Optimization

Learning Output Kernels with Block Coordinate DescentWe propose a method to learn simultaneously a vectorvalued function and a kernel between its components. The obtained kernel can be used both to improve learning performances and to reveal structures in the output space which may be important in their own right. Our method is based on the solution of a suitable regularization problem over a reproducing kernel Hilbert space (RKHS) of vectorvalued functions. Although the regularized risk functional is nonconvex, we show that it is invex, implying that all local minimizers are global minimizers. We derive a blockwise coordinate descent method that efficiently exploits the structure of the objective functional. Then, we empirically demonstrate that the proposed method can improve classification accuracy. Finally, we provide a visual interpretation of the learned kernel matrix for some well known datasets.

Implementing regularization implicitly via approximate eigenvector computationRegularization is a powerful technique for extracting useful information from noisy data. Typically, it is implemented by adding some sort of norm constraint to an objective function and then exactly optimizing the modified objective function. This procedure often leads to optimization problems that are computationally more expensive than the original problem, a fact that is clearly problematic if one is interested in largescale applications. On the other hand, a large body of empirical work has demonstrated that heuristics, and in some cases approximation algorithms, developed to speed up computations sometimes have the sideeffect of performing regularization implicitly. Thus, we consider the question: What is the regularized optimization objective that an approximation algorithm is exactly optimizing? We address this question in the context of computing approximations to the smallest nontrivial eigenvector of a graph Laplacian; and we consider three randomwalkbased procedures: one based on the heat kernel of the graph, one based on computing the the PageRank vector associated with the graph, and one based on a truncated lazy random walk. In each case, we provide a precise characterization of the manner in which the approximation method can be viewed as implicitly computing the exact solution to a regularized problem. Interestingly, the regularization is not on the usual vector form of the optimization problem, but instead it is on a related semidefinite program.

Adaptive Kernel Approximation for LargeScale NonLinear SVM PredictionThe applicability of nonlinear support vector machines (SVMs) has been limited in largescale data collections because of their linear prediction complexity to the size of support vectors. We propose an efficient prediction algorithm with performance guarantee for nonlinear SVMs, termed AdaptSVM. It can selectively collapse the kernel function computation to a reduced set of support vectors, compensated by an additional correction term that can be easily computed online. It also allows adaptive fallback to original kernel computation based on its estimated variance and maximum error tolerance. In addition to theoretical analysis, we empirically evaluate on multiple largescale datasets to show that the proposed algorithm can speed up the prediction process up to 10000 times with only <0.5 accuracy loss.

Suboptimal Solution Path Algorithm for Support Vector MachineWe consider a suboptimal solution path algorithm for the Support Vector Machine. The solution path algorithm is known as an effective tool for solving a sequence of a parametrized optimization problems in machine learning. However, the algorithm needs to keep strict optimality conditions satisfied everywhere on the path. This requirement narrows the applicability of the path algorithm and adversely affects its computational efficiency. In our algorithm, user can specify tolerances to the optimality and control the tradeoff between accuracy of the solution and the computational cost. We also show that our suboptimal solutions can be interpreted as the solution of a perturbed optimization problem from the original one, provide some theoretical analyses of our algorithm based on a novel interpretation. The experimental results demonstrate the effectiveness of our algorithm in terms of efficiency and accuracy.
 All Sessions
 Keynotes
 Bandits and Online Learning
 Structured Output
 Reinforcement Learning
 Graphical Models and Optimization
 Recommendation and Matrix Factorization
 Neural Networks and Statistical Methods
 Learning Theory
 Feature Selection, Dimensionality Reduction
 Invited CrossConference Track
 LatentVariable Models
 LargeScale Learning
 Neural Networks and Deep Learning
 LatentVariable Models
 Active and Online Learning
 Tutorial : Collective Intelligence and Machine Learning
 Tutorial: Machine Learning in Ecological Science and Environmental Policy
 Tutorial: Machine Learning and Robotics
 Tutorial: Introduction to Bandits: Algorithms and Theory
 Ensemble Methods
 Tutorial: Machine Learning for Large Scale Recommender Systems
 Tutorial: Learning Kernels
 TestofTime
 Best Paper
 Robotics and Reinforcement Learning
 Transfer Learning
 Kernel Methods
 Optimization
 Learning Theory
 Invited CrossConference Session
 Neural Networks and Deep Learning
 Reinforcement Learning
 Bayesian Inference and Probabilistic Models
 Supervised Learning
 Social Networks
 Evaluation Metrics
 statistical relational learning
 Outlier Detection
 Time Series
 Graphical Models and Bayesian Inference
 Sparsity and Compressed Sensing
 Clustering
 Game Theory and Planning and Control
 SemiSupervised Learning
 Kernel Methods and Optimization
 Neural Networks and NLP
 Probabilistic Models & MCMC
 Online Learning
 Ranking and Information Retrieval