ICML 2011
TechTalks from event: ICML 2011
Probabilistic Models & MCMC
-
Probabilistic Matrix AdditionWe introduce Probabilistic Matrix Addition (PMA) for modeling real-valued data matrices by simultaneously capturing covariance structure among rows and among columns. PMA additively combines two latent matrices drawn from two Gaussian Processes respectively over rows and columns. The resulting joint distribution over the observed matrix does not factorize over entries, rows, or columns, and can thus capture intricate dependencies in the matrix. Exact inference in PMA is possible, but involves inversion of large matrices, and can be computationally prohibitive. Efficient approximate inference is possible due to the sparse dependency structure among latent variables. We propose two families of approximate inference algorithms for PMA based on Gibbs sampling and MAP inference. We demonstrate the effectiveness of PMA for missing value prediction and multi-label classification problems.
-
SampleRank: Training Factor Graphs with Atomic GradientsWe present SampleRank, an alternative to contrastive divergence (CD) for estimating parameters in complex graphical models. SampleRank harnesses a user-provided loss function to distribute stochastic gradients across an MCMC chain. As a result, parameter updates can be computed between arbitrary MCMC states. SampleRank is not only faster than CD, but also achieves better accuracy in practice (up to 23\% error reduction on noun-phrase coreference).
-
A New Bayesian Rating System for Team CompetitionsWe present a novel probabilistic rating system for team competitions. Building upon TrueSkill(tm), we change the factor graph structure to cope with the problems of TrueSkill(tm), e.g., multiway ties and variable team size. We give detailed inference algorithms for the new structure. Experimental results show a significant improvement over TrueSkill(tm).
-
Bayesian Learning via Stochastic Gradient Langevin DynamicsIn this paper we propose a new framework for learning from large scale datasets based on iterative learning from small mini-batches. By adding the right amount of noise to a standard stochastic gradient optimization algorithm we show that the iterates will converge to samples from the true posterior distribution as we anneal the stepsize. This seamless transition between optimization and Bayesian posterior sampling provides an in-built protection against overfitting. We also propose a practical method for Monte Carlo estimates of posterior statistics which monitors a ``sampling threshold'' and collects samples after it has been surpassed. We apply the method to three models: a mixture of Gaussians, logistic regression and ICA with natural gradients.
-
ABC-EP: Expectation Propagation for Likelihood-free Bayesian ComputationMany statistical models of interest to the natural and social sciences have no tractable likelihood function. Until recently, Bayesian inference for such models was thought infeasible. Pritchard et al. (1999) introduced an algorithm known as ABC, for Approximate Bayesian Computation, that enables Bayesian computation in such models. Despite steady progress since this first breakthrough, such as the adaptation of MCMC and Sequential Monte Carlo techniques to likelihood-free inference, state-of-the art methods remain notoriously hard to use and require enormous computation times. Among other issues, one faces the difficult task of finding appropriate summary statistics for the model, and tuning the algorithm can be time-consuming when little prior information is available. We show that Expectation Propagation, a widely successful approximate inference technique, can be adapted to the likelihood-free context. The resulting algorithm does not require summary statistics, is an order of magnitude faster than existing techniques, and remains usable when prior information is vague.
- All Sessions
- Keynotes
- Bandits and Online Learning
- Structured Output
- Reinforcement Learning
- Graphical Models and Optimization
- Recommendation and Matrix Factorization
- Neural Networks and Statistical Methods
- Invited Cross-Conference Track
- Feature Selection, Dimensionality Reduction
- Learning Theory
- Large-Scale Learning
- Latent-Variable Models
- Neural Networks and Deep Learning
- Active and Online Learning
- Latent-Variable Models
- Tutorial : Collective Intelligence and Machine Learning
- Tutorial: Learning Kernels
- Tutorial: Machine Learning for Large Scale Recommender Systems
- Tutorial: Introduction to Bandits: Algorithms and Theory
- Ensemble Methods
- Tutorial: Machine Learning and Robotics
- Tutorial: Machine Learning in Ecological Science and Environmental Policy
- Test-of-Time
- Best Paper
- Robotics and Reinforcement Learning
- Transfer Learning
- Kernel Methods
- Optimization
- Learning Theory
- Invited Cross-Conference Session
- Neural Networks and Deep Learning
- Reinforcement Learning
- Bayesian Inference and Probabilistic Models
- Supervised Learning
- Social Networks
- Evaluation Metrics
- statistical relational learning
- Outlier Detection
- Time Series
- Graphical Models and Bayesian Inference
- Sparsity and Compressed Sensing
- Clustering
- Game Theory and Planning and Control
- Semi-Supervised Learning
- Kernel Methods and Optimization
- Neural Networks and NLP
- Probabilistic Models & MCMC
- Online Learning
- Ranking and Information Retrieval