ICML 2011
TechTalks from event: ICML 2011
Reinforcement Learning
-
Structure Learning in Ergodic Factored MDPs without Knowledge of the Transition Function's In-DegreeThis paper introduces Learn Structure and Exploit RMax (LSE-RMax), a novel model based structure learning algorithm for ergodic factored-state MDPs. Given a planning horizon that satisfies a condition, LSE-RMax provably guarantees a return very close to the optimal return, with a high certainty, without requiring any prior knowledge of the in-degree of the transition function as input. LSE-RMax is fully implemented with a thorough analysis of its sample complexity. We also present empirical results demonstrating its effectiveness compared to prior approaches to the problem.
-
The Infinite Regionalized Policy RepresentationWe introduce the infinite regionalized policy presentation (iRPR), as a nonparametric policy for reinforcement learning in partially observable Markov decision processes (POMDPs). The iRPR assumes an unbounded set of decision states a priori, and infers the number of states to represent the policy given the experiences. We propose algorithms for learning the number of decision states while maintaining a proper balance between exploration and exploitation. Convergence analysis is provided, along with performance evaluations on benchmark problems.
-
Online Discovery of Feature DependenciesOnline representational expansion techniques have improved the learning speed of existing reinforcement learning (RL) algorithms in low dimensional domains, yet existing online expansion methods do not scale well to high dimensional problems. We conjecture that one of the main difficulties limiting this scaling is that features defined over the full-dimensional state space often generalize poorly. Hence, we introduce incremental Feature Dependency Discovery (iFDD) as a computationally-inexpensive method for representational expansion that can be combined with any online, value-based RL method that uses binary features. Unlike other online expansion techniques, iFDD creates new features in low dimensional subspaces of the full state space where feedback errors persist. We provide convergence and computational complexity guarantees for iFDD, as well as showing empirically that iFDD scales well to high dimensional multi-agent planning domains with hundreds of millions of state-action pairs.
-
Doubly Robust Policy Evaluation and LearningWe study decision making in environments where the reward is only partially observed, but can be modeled as a function of an action and an observed context. This setting, known as contextual bandits, encompasses a wide variety of applications including health-care policy and Internet advertising. A central task is evaluation of a new policy given historic data consisting of contexts, actions and received rewards. The key challenge is that the past data typically does not faithfully represent proportions of actions taken by a new policy. Previous approaches rely either on models of rewards or models of the past policy. The former are plagued by a large bias whereas the latter have a large variance. In this work, we leverage the strength and overcome the weaknesses of the two approaches by applying the emph{doubly robust} technique to the problems of policy evaluation and optimization. We prove that this approach yields accurate value estimates when we have emph{either} a good (but not necessarily consistent) model of rewards emph{or} a good (but not necessarily consistent) model of past policy. Extensive empirical comparison demonstrates that the doubly robust approach uniformly improves over existing techniques, achieving both lower variance in value estimation and better policies. As such, we expect the doubly robust approach to become common practice.
Graphical Models and Optimization
-
Dynamic Tree Block Coordinate AscentThis paper proposes a novel Linear Programming (LP) based algorithm, called Dynamic Tree-Block Coordinate Ascent (DTBCA), for performing maximum a posteriori (MAP) inference in probabilistic graphical models. Unlike traditional message passing algorithms, which operate uniformly on the whole factor graph, our method dynamically chooses regions of the factor graph on which to focus message-passing efforts. We propose two criteria for selecting regions, including an efficiently computable upper-bound on the increase in the objective possible by passing messages in any particular region. This bound is derived from the theory of primal-dual methods from combinatorial optimization, and the forest that maximizes the bounds can be chosen efficiently using a maximum-spanning-tree-like algorithm. Experimental results show that our dynamic schedules significantly speed up state-of-the- art LP-based message-passing algorithms on a wide variety of real-world problems.
-
Approximation Bounds for Inference using Cooperative CutsWe analyze a family of probability distributions that are characterized by an embedded combinatorial structure. This family includes models having arbitrary treewidth and arbitrary sized factors. Unlike general models with such freedom, where the “most probable explanation” (MPE) problem is inapproximable, the combinatorial structure within our model, in particular the indirect use of submodularity, leads to several MPE algorithms that all have approximation guarantees.
-
Convex Max-Product over Compact Sets for Protein FoldingIn this paper we present an approach to inference in graphical models with mixture of discrete and bounded continuous variables. In particular, we extend convex max-product to deal with these hybrid models and derive the conditions under which our approach is guaranteed to produce the MAP assignment. When dealing with continuous variables the messages are functions. We investigate a multi-grid approach which can be viewed as a piecewise constant representation of these messages. While this approach provides the theoretical guarantees it is not very practical. Inspired by this, we further propose a particle convex max-product algorithm that significantly outperforms existing particle methods in the task of protein folding and performs comparable to the state-of-the art while using a smaller amount of prior knowledge.
-
On the Use of Variational Inference for Learning Discrete Graphical ModelsWe study the general class of estimators for graphical model structure based on optimizing $ell_1$-regularized approximate log-likelihood, where the approximate likelihood uses tractable variational approximations of the partition function. We provide a message-passing algorithm that emph{directly} computes the $ell_1$ regularized approximate MLE. Further, in the case of certain reweighted entropy approximations to the partition function, we show that surprisingly the $ell_1$ regularized approximate MLE estimator has a emph{closed-form}, so that we would no longer need to run through many iterations of approximate inference and message-passing. Lastly, we analyze this general class of estimators for graph structure recovery, or its emph{sparsistency}, and show that it is indeed sparsistent under certain conditions.
Recommendation and Matrix Factorization
-
GoDec: Randomized Low-rank & Sparse Matrix Decomposition in Noisy CaseLow-rank and sparse structures have been profoundly studied in matrix completion and compressed sensing. In this paper, we develop ``Go Decomposition'' (GoDec) to efficiently and robustly estimate the low-rank part $L$ and the sparse part $S$ of a matrix $X=L+S+G$ with noise $G$. GoDec alternatively assigns the low-rank approximation of $X-S$ to $L$ and the sparse approximation of $X-L$ to $S$. The algorithm can be significantly accelerated by bilateral random projections (BRP). We also propose GoDec for matrix completion as an important variant. We prove that the objective value $|X-L-S|_F^2$ converges to a local minimum, while $L$ and $S$ linearly converge to local optimums. Theoretically, we analyze the influence of $L$, $S$ and $G$ to the asymptotic/convergence speeds in order to discover the robustness of GoDec. Empirical studies suggest the efficiency, robustness and effectiveness of GoDec comparing with representative matrix decomposition and completion tools, e.g., Robust PCA and OptSpace.
-
Large-Scale Convex Minimization with a Low-Rank ConstraintWe address the problem of minimizing a convex function over the space of large matrices with low rank. While this optimization problem is hard in general, we propose an efficient greedy algorithm and derive its formal approximation guarantees. Each iteration of the algorithm involves (approximately) finding the left and right singular vectors corresponding to the largest singular value of a certain matrix, which can be calculated in linear time. This leads to an algorithm which can scale to large matrices arising in several applications such as matrix completion for collaborative filtering and robust low rank matrix approximation.
-
Linear Regression under Fixed-Rank Constraints: A Riemannian ApproachIn this paper, we tackle the problem of learning a linear regression model whose parameter is a fixed-rank matrix. We study the Riemannian manifold geometry of the set of fixed-rank matrices and develop efficient line-search algorithms. The proposed algorithms have many applications, scale to high-dimensional problems, enjoy local convergence properties and confer a geometric basis to recent contributions on learning fixed-rank matrices. Numerical experiments on benchmarks suggest that the proposed algorithms compete with the state-of-the-art, and that manifold optimization offers a versatile framework for the design of rank-constrained machine learning algorithms.
-
Clustering by Left-Stochastic Matrix FactorizationWe propose clustering samples given their pairwise similarities by factorizing the similarity matrix into the product of a cluster probability matrix and its transpose. We propose a rotation-based algorithm to compute this left-stochastic decomposition (LSD). Theoretical results link the LSD clustering method to a soft kernel k-means clustering, give conditions for when the factorization and clustering are unique, and provide error bounds. Experimental results on simulated and real similarity datasets show that the proposed method reliably provides accurate clusterings.
- All Sessions
- Keynotes
- Bandits and Online Learning
- Structured Output
- Reinforcement Learning
- Graphical Models and Optimization
- Recommendation and Matrix Factorization
- Neural Networks and Statistical Methods
- Latent-Variable Models
- Large-Scale Learning
- Learning Theory
- Feature Selection, Dimensionality Reduction
- Invited Cross-Conference Track
- Neural Networks and Deep Learning
- Latent-Variable Models
- Active and Online Learning
- Tutorial : Collective Intelligence and Machine Learning
- Tutorial: Machine Learning in Ecological Science and Environmental Policy
- Tutorial: Machine Learning and Robotics
- Ensemble Methods
- Tutorial: Introduction to Bandits: Algorithms and Theory
- Tutorial: Machine Learning for Large Scale Recommender Systems
- Tutorial: Learning Kernels
- Test-of-Time
- Best Paper
- Robotics and Reinforcement Learning
- Transfer Learning
- Kernel Methods
- Optimization
- Learning Theory
- Invited Cross-Conference Session
- Neural Networks and Deep Learning
- Reinforcement Learning
- Bayesian Inference and Probabilistic Models
- Supervised Learning
- Social Networks
- Evaluation Metrics
- statistical relational learning
- Outlier Detection
- Time Series
- Graphical Models and Bayesian Inference
- Sparsity and Compressed Sensing
- Clustering
- Game Theory and Planning and Control
- Semi-Supervised Learning
- Kernel Methods and Optimization
- Neural Networks and NLP
- Probabilistic Models & MCMC
- Online Learning
- Ranking and Information Retrieval