TechTalks from event: ICML 2011

Reinforcement Learning

  • Structure Learning in Ergodic Factored MDPs without Knowledge of the Transition Function's In-Degree Authors: Doran Chakraborty; Peter Stone
    This paper introduces Learn Structure and Exploit RMax (LSE-RMax), a novel model based structure learning algorithm for ergodic factored-state MDPs. Given a planning horizon that satisfies a condition, LSE-RMax provably guarantees a return very close to the optimal return, with a high certainty, without requiring any prior knowledge of the in-degree of the transition function as input. LSE-RMax is fully implemented with a thorough analysis of its sample complexity. We also present empirical results demonstrating its effectiveness compared to prior approaches to the problem.
  • The Infinite Regionalized Policy Representation Authors: Miao Liu; Xuejun Liao; Lawrence Carin
    We introduce the infinite regionalized policy presentation (iRPR), as a nonparametric policy for reinforcement learning in partially observable Markov decision processes (POMDPs). The iRPR assumes an unbounded set of decision states a priori, and infers the number of states to represent the policy given the experiences. We propose algorithms for learning the number of decision states while maintaining a proper balance between exploration and exploitation. Convergence analysis is provided, along with performance evaluations on benchmark problems.
  • Online Discovery of Feature Dependencies Authors: Alborz Geramifard; Finale Doshi; Joshua Redding; Nicholas Roy; Jonathan How
    Online representational expansion techniques have improved the learning speed of existing reinforcement learning (RL) algorithms in low dimensional domains, yet existing online expansion methods do not scale well to high dimensional problems. We conjecture that one of the main difficulties limiting this scaling is that features defined over the full-dimensional state space often generalize poorly. Hence, we introduce incremental Feature Dependency Discovery (iFDD) as a computationally-inexpensive method for representational expansion that can be combined with any online, value-based RL method that uses binary features. Unlike other online expansion techniques, iFDD creates new features in low dimensional subspaces of the full state space where feedback errors persist. We provide convergence and computational complexity guarantees for iFDD, as well as showing empirically that iFDD scales well to high dimensional multi-agent planning domains with hundreds of millions of state-action pairs.
  • Doubly Robust Policy Evaluation and Learning Authors: Miroslav Dudik; John Langford; Lihong Li
    We study decision making in environments where the reward is only partially observed, but can be modeled as a function of an action and an observed context. This setting, known as contextual bandits, encompasses a wide variety of applications including health-care policy and Internet advertising. A central task is evaluation of a new policy given historic data consisting of contexts, actions and received rewards. The key challenge is that the past data typically does not faithfully represent proportions of actions taken by a new policy. Previous approaches rely either on models of rewards or models of the past policy. The former are plagued by a large bias whereas the latter have a large variance. In this work, we leverage the strength and overcome the weaknesses of the two approaches by applying the emph{doubly robust} technique to the problems of policy evaluation and optimization. We prove that this approach yields accurate value estimates when we have emph{either} a good (but not necessarily consistent) model of rewards emph{or} a good (but not necessarily consistent) model of past policy. Extensive empirical comparison demonstrates that the doubly robust approach uniformly improves over existing techniques, achieving both lower variance in value estimation and better policies. As such, we expect the doubly robust approach to become common practice.

Graphical Models and Optimization

  • Dynamic Tree Block Coordinate Ascent Authors: Daniel Tarlow; Dhruv Batra; Pushmeet Kohli; Vladimir Kolmogorov
    This paper proposes a novel Linear Programming (LP) based algorithm, called Dynamic Tree-Block Coordinate Ascent (DTBCA), for performing maximum a posteriori (MAP) inference in probabilistic graphical models. Unlike traditional message passing algorithms, which operate uniformly on the whole factor graph, our method dynamically chooses regions of the factor graph on which to focus message-passing efforts. We propose two criteria for selecting regions, including an efficiently computable upper-bound on the increase in the objective possible by passing messages in any particular region. This bound is derived from the theory of primal-dual methods from combinatorial optimization, and the forest that maximizes the bounds can be chosen efficiently using a maximum-spanning-tree-like algorithm. Experimental results show that our dynamic schedules significantly speed up state-of-the- art LP-based message-passing algorithms on a wide variety of real-world problems.
  • Approximation Bounds for Inference using Cooperative Cuts Authors: Stefanie Jegelka; Jeff Bilmes
    We analyze a family of probability distributions that are characterized by an embedded combinatorial structure. This family includes models having arbitrary treewidth and arbitrary sized factors. Unlike general models with such freedom, where the “most probable explanation” (MPE) problem is inapproximable, the combinatorial structure within our model, in particular the indirect use of submodularity, leads to several MPE algorithms that all have approximation guarantees.
  • Convex Max-Product over Compact Sets for Protein Folding Authors: Jian Peng; Tamir Hazan; David McAllester; Raquel Urtasun
    In this paper we present an approach to inference in graphical models with mixture of discrete and bounded continuous variables. In particular, we extend convex max-product to deal with these hybrid models and derive the conditions under which our approach is guaranteed to produce the MAP assignment. When dealing with continuous variables the messages are functions. We investigate a multi-grid approach which can be viewed as a piecewise constant representation of these messages. While this approach provides the theoretical guarantees it is not very practical. Inspired by this, we further propose a particle convex max-product algorithm that significantly outperforms existing particle methods in the task of protein folding and performs comparable to the state-of-the art while using a smaller amount of prior knowledge.
  • On the Use of Variational Inference for Learning Discrete Graphical Models Authors: Eunho Yang; Pradeep Ravikumar
    We study the general class of estimators for graphical model structure based on optimizing $ell_1$-regularized approximate log-likelihood, where the approximate likelihood uses tractable variational approximations of the partition function. We provide a message-passing algorithm that emph{directly} computes the $ell_1$ regularized approximate MLE. Further, in the case of certain reweighted entropy approximations to the partition function, we show that surprisingly the $ell_1$ regularized approximate MLE estimator has a emph{closed-form}, so that we would no longer need to run through many iterations of approximate inference and message-passing. Lastly, we analyze this general class of estimators for graph structure recovery, or its emph{sparsistency}, and show that it is indeed sparsistent under certain conditions.

Recommendation and Matrix Factorization

  • GoDec: Randomized Low-rank & Sparse Matrix Decomposition in Noisy Case Authors: Tianyi Zhou; Dacheng Tao
    Low-rank and sparse structures have been profoundly studied in matrix completion and compressed sensing. In this paper, we develop ``Go Decomposition'' (GoDec) to efficiently and robustly estimate the low-rank part $L$ and the sparse part $S$ of a matrix $X=L+S+G$ with noise $G$. GoDec alternatively assigns the low-rank approximation of $X-S$ to $L$ and the sparse approximation of $X-L$ to $S$. The algorithm can be significantly accelerated by bilateral random projections (BRP). We also propose GoDec for matrix completion as an important variant. We prove that the objective value $|X-L-S|_F^2$ converges to a local minimum, while $L$ and $S$ linearly converge to local optimums. Theoretically, we analyze the influence of $L$, $S$ and $G$ to the asymptotic/convergence speeds in order to discover the robustness of GoDec. Empirical studies suggest the efficiency, robustness and effectiveness of GoDec comparing with representative matrix decomposition and completion tools, e.g., Robust PCA and OptSpace.
  • Large-Scale Convex Minimization with a Low-Rank Constraint Authors: Shai Shalev-Shwartz; Alon Gonen; Ohad Shamir
    We address the problem of minimizing a convex function over the space of large matrices with low rank. While this optimization problem is hard in general, we propose an efficient greedy algorithm and derive its formal approximation guarantees. Each iteration of the algorithm involves (approximately) finding the left and right singular vectors corresponding to the largest singular value of a certain matrix, which can be calculated in linear time. This leads to an algorithm which can scale to large matrices arising in several applications such as matrix completion for collaborative filtering and robust low rank matrix approximation.
  • Linear Regression under Fixed-Rank Constraints: A Riemannian Approach Authors: Gilles Meyer; Silvère Bonnabel; Rodolphe Sepulchre
    In this paper, we tackle the problem of learning a linear regression model whose parameter is a fixed-rank matrix. We study the Riemannian manifold geometry of the set of fixed-rank matrices and develop efficient line-search algorithms. The proposed algorithms have many applications, scale to high-dimensional problems, enjoy local convergence properties and confer a geometric basis to recent contributions on learning fixed-rank matrices. Numerical experiments on benchmarks suggest that the proposed algorithms compete with the state-of-the-art, and that manifold optimization offers a versatile framework for the design of rank-constrained machine learning algorithms.
  • Clustering by Left-Stochastic Matrix Factorization Authors: Raman Arora; Maya Gupta; Amol Kapila; Maryam Fazel,
    We propose clustering samples given their pairwise similarities by factorizing the similarity matrix into the product of a cluster probability matrix and its transpose. We propose a rotation-based algorithm to compute this left-stochastic decomposition (LSD). Theoretical results link the LSD clustering method to a soft kernel k-means clustering, give conditions for when the factorization and clustering are unique, and provide error bounds. Experimental results on simulated and real similarity datasets show that the proposed method reliably provides accurate clusterings.