TechTalks from event: ICML 2011

Game Theory and Planning and Control

  • Integrating Partial Model Knowledge in Model Free RL Algorithms Authors: Aviv Tamar; Dotan Di Castro; Ron Meir
    In reinforcement learning an agent uses online feedback from the environment and prior knowledge in order to adaptively select an effective policy. Model free approaches address this task by directly mapping external and internal states to actions, while model based methods attempt to construct a model of the environment, followed by a selection of optimal actions based on that model. Given the complementary advantages of both approaches, we suggest a novel algorithm which combines them into a single algorithm, which switches between a model based and a model free mode, depending on the current environmental state and on the status of the agent's knowledge. We prove that such an approach leads to improved performance whenever environmental knowledge is available, without compromising performance when such knowledge is absent. Numerical simulations demonstrate the effectiveness of the approach and suggest its efficacy in boosting policy gradient learning.
  • Task Space Retrieval Using Inverse Feedback Control Authors: Nikolay Jetchev; Marc Toussaint
    Learning complex skills by repeating and generalizing expert behavior is a fundamental problem in robotics. A common approach is learning from demonstration: given examples of correct motions, learn a policy mapping state to action consistent with the training data. However, the usual approaches do not answer the question of what are appropriate representations to generate motions for specific tasks. Inspired by Inverse Optimal Control, we present a novel method to learn latent costs, imitate and generalize demonstrated behavior, and discover a task relevant motion representation: Task Space Retrieval Using Inverse Feedback Control (TRIC). We use the learned latent costs to create motion with a feedback controller. We tested our method on robot grasping of objects, a challenging high-dimensional task. TRIC learns the important control dimensions for the grasping task from a few example movements and is able to robustly approach and grasp objects in new situations.
  • PILCO: A Model-Based and Data-Efficient Approach to Policy Search Authors: Marc Deisenroth; Carl Rasmussen
    In this paper, we introduce PILCO, a practical, data-efficient model-based policy search method. PILCO reduces model bias, one of the key problems of model-based reinforcement learning, in a principled way. By learning a probabilistic dynamics model and explicitly incorporating model uncertainty into long-term planning, PILCO can cope with very little data and facilitates learning from scratch in only a few trials. Policy evaluation is performed in closed form using state-of-the-art approximate inference. Furthermore, policy gradients are computed analytically for policy improvement. We report unprecedented learning efficiency on challenging and high-dimensional control tasks.
  • Approximating Correlated Equilibria using Relaxations on the Marginal Polytope Authors: Hetunandan Kamisetty; Eric Xing; Christopher Langmead
    In game theory, a Correlated Equilibrium (CE) is an equilibrium concept that generalizes the more well-known Nash Equilibrium. If the game is represented as a graphical game, the computational complexity of computing an optimum CE is exponential in the tree-width of the graph. In settings where this exact computation is not feasible, it is desirable to approximate the properties of the CE, such as its expected social utility and marginal probabilities. We study outer relaxations of this problem that yield approximate marginal strategies for the players under a variety of utility functions. Results on simulated games and in a real problem involving drug design indicate that our approximations can be highly accurate and can be successfully used when exact computation of CE is infeasible.
  • Generalized Value Functions for Large Action Sets Authors: Jason Pazis; Ron Parr
    The majority of value function approximation based reinforcement learning algorithms available today, focus on approximating the state (V) or state-action (Q) value function and efficient action selection comes as an afterthought. On the other hand, real-world problems tend to have large action spaces, where evaluating every possible action becomes impractical. This mismatch presents a major obstacle in successfully applying reinforcement learning to real-world problems. In this paper we present a unified view of V and Q functions and arrive at a new space-efficient representation, where action selection can be done exponentially faster, without the use of a model. We then describe how to calculate this new value function efficiently via approximate linear programming and provide experimental results that demonstrate the effectiveness of the proposed approach.

Semi-Supervised Learning

  • Vector-valued Manifold Regularization Authors: Ha Quang Minh; Vikas Sindhwani
    We consider the general problem of learning an unknown functional dependency, f : X->Y, between a structured input space X and a structured output space Y, from labeled and unlabeled examples. We formulate this problem in terms of data-dependent regularization in Vector-valued Reproducing Kernel Hilbert Spaces (Micchelli & Pontil, 2005) which elegantly extend familiar scalar-valued kernel methods to the general setting where Y has a Hilbert space structure. Our methods provide a natural extension of Manifold Regularization (Belkin et al., 2006) algorithms to also exploit output inter-dependencies while enforcing smoothness with respect to input data geometry. We propose a class of matrix-valued kernels which allow efficient implementations of our algorithms via the use of numerical solvers for Sylvester matrix equations. On multilabel image annotation and text classification problems, we find favorable empirical comparisons against several competing alternatives.
  • Semi-supervised Penalized Output Kernel Regression for Link Prediction Authors: Céline Brouard; Florence D'Alche-Buc; Marie Szafranski
    Link prediction is addressed as an output kernel learning task through semi-supervised Output Kernel Regression. Working in the framework of RKHS theory with vector-valued functions, we establish a new representer theorem devoted to semi-supervised least square regression. We then apply it to get a new model (POKR: Penalized Output Kernel Regression) and show its relevance using numerical experiments on artificial networks and two real applications using a very low percentage of labeled data in a transductive setting.
  • Access to Unlabeled Data can Speed up Prediction Time Authors: Ruth Urner; Shai Shalev-Shwartz; Shai Ben-David
    Semi-supervised learning (SSL) addresses the problem of training a classifier using a small number of labeled examples and many unlabeled examples. Most previous work on SSL focused on how availability of unlabeled data can improve the accuracy of the learned classifiers. In this work we study how unlabeled data can be beneficial for constructing faster classifiers. We propose an SSL algorithmic framework which can utilize unlabeled examples for learning classifiers from a predefined set of fast classifiers. We formally analyze conditions under which our algorithmic paradigm obtains significant improvements by the use of unlabeled data. As a side benefit of our analysis we propose a novel quantitative measure of the so-called cluster assumption. We demonstrate the potential merits of our approach by conducting experiments on the MNIST data set, showing that, when a sufficiently large unlabeled sample is available, a fast classifier can be learned from much fewer labeled examples than without such a sample.
  • Automatic Feature Decomposition for Single View Co-training Authors: Minmin Chen; Kilian Weinberger; Yixin Chen
    One of the most successful semi-supervised learning approaches is co-training for multi-view data. In co-training, one trains two classifiers, one for each view, and uses the most confident predictions of the unlabeled data for the two classifiers to ``teach each other''. In this paper, we extend co-training to learning scenarios without an explicit multi-view representation. Inspired by a theoretical analysis of Balcan et. al (2004), we introduce a novel algorithm that splits the feature space during learning, explicitly to encourage co-training to be successful. We demonstrate the efficacy of our proposed method in a weakly-supervised setting on the challenging Caltech-256 object recognition task, where we improve significantly over previous results by (Bergamo & Torresani, 2010) in almost all training-set size settings.
  • Towards Making Unlabeled Data Never Hurt Authors: Yu-Feng Li; Zhi-Hua Zhou
    It is usually expected that, when labeled data are limited, the learning performance can be improved by exploiting unlabeled data. In many cases, however, the performances of current semi-supervised learning approaches may be even worse than purely using the limited labeled data.It is desired to have extit{safe} semi-supervised learning approaches which never degenerate learning performance by using unlabeled data. In this paper, we focus on semi-supervised support vector machines (S3VMs) and propose S4VMs, i.e., safe S3VMs. Unlike S3VMs which typically aim at approaching an optimal low-density separator, S4VMs try to exploit the candidate low-density separators simultaneously to reduce the risk of identifying a poor separator with unlabeled data. We describe two implementations of S4VMs, and our comprehensive experiments show that the overall performance of S4VMs are highly competitive to S3VMs, while in contrast to S3VMs which degenerate performance in many cases, S4VMs are never significantly inferior to inductive SVMs.

Kernel Methods and Optimization

  • Learning Output Kernels with Block Coordinate Descent Authors: Francesco Dinuzzo; Cheng Soon Ong; Peter Gehler; Gianluigi Pillonetto
    We propose a method to learn simultaneously a vector-valued function and a kernel between its components. The obtained kernel can be used both to improve learning performances and to reveal structures in the output space which may be important in their own right. Our method is based on the solution of a suitable regularization problem over a reproducing kernel Hilbert space (RKHS) of vector-valued functions. Although the regularized risk functional is non-convex, we show that it is invex, implying that all local minimizers are global minimizers. We derive a block-wise coordinate descent method that efficiently exploits the structure of the objective functional. Then, we empirically demonstrate that the proposed method can improve classification accuracy. Finally, we provide a visual interpretation of the learned kernel matrix for some well known datasets.
  • Implementing regularization implicitly via approximate eigenvector computation Authors: Michael Mahoney; Lorenzo Orecchia
    Regularization is a powerful technique for extracting useful information from noisy data. Typically, it is implemented by adding some sort of norm constraint to an objective function and then exactly optimizing the modified objective function. This procedure often leads to optimization problems that are computationally more expensive than the original problem, a fact that is clearly problematic if one is interested in large-scale applications. On the other hand, a large body of empirical work has demonstrated that heuristics, and in some cases approximation algorithms, developed to speed up computations sometimes have the side-effect of performing regularization implicitly. Thus, we consider the question: What is the regularized optimization objective that an approximation algorithm is exactly optimizing? We address this question in the context of computing approximations to the smallest nontrivial eigenvector of a graph Laplacian; and we consider three random-walk-based procedures: one based on the heat kernel of the graph, one based on computing the the PageRank vector associated with the graph, and one based on a truncated lazy random walk. In each case, we provide a precise characterization of the manner in which the approximation method can be viewed as implicitly computing the exact solution to a regularized problem. Interestingly, the regularization is not on the usual vector form of the optimization problem, but instead it is on a related semidefinite program.
  • Adaptive Kernel Approximation for Large-Scale Non-Linear SVM Prediction Authors: Michele Cossalter; Rong Yan; Lu Zheng
    The applicability of non-linear support vector machines (SVMs) has been limited in large-scale data collections because of their linear prediction complexity to the size of support vectors. We propose an efficient prediction algorithm with performance guarantee for non-linear SVMs, termed AdaptSVM. It can selectively collapse the kernel function computation to a reduced set of support vectors, compensated by an additional correction term that can be easily computed on-line. It also allows adaptive fall-back to original kernel computation based on its estimated variance and maximum error tolerance. In addition to theoretical analysis, we empirically evaluate on multiple large-scale datasets to show that the proposed algorithm can speed up the prediction process up to 10000 times with only <0.5 accuracy loss.
  • Suboptimal Solution Path Algorithm for Support Vector Machine Authors: Masayuki Karasuyama; Ichiro Takeuchi
    We consider a suboptimal solution path algorithm for the Support Vector Machine. The solution path algorithm is known as an effective tool for solving a sequence of a parametrized optimization problems in machine learning. However, the algorithm needs to keep strict optimality conditions satisfied everywhere on the path. This requirement narrows the applicability of the path algorithm and adversely affects its computational efficiency. In our algorithm, user can specify tolerances to the optimality and control the trade-off between accuracy of the solution and the computational cost. We also show that our suboptimal solutions can be interpreted as the solution of a perturbed optimization problem from the original one, provide some theoretical analyses of our algorithm based on a novel interpretation. The experimental results demonstrate the effectiveness of our algorithm in terms of efficiency and accuracy.
  • Functional Regularized Least Squares Classication with Operator-valued Kernels Authors: Hachem Kadri; Asma Rabaoui; Philippe Preux; Emmanuel Duflos; Alain Rakotomamonjy