TechTalks from event: ICML 2011

Probabilistic Models & MCMC

  • Probabilistic Matrix Addition Authors: Amrudin Agovic; Arindam Banerjee; Snigdhansu Chatterje
    We introduce Probabilistic Matrix Addition (PMA) for modeling real-valued data matrices by simultaneously capturing covariance structure among rows and among columns. PMA additively combines two latent matrices drawn from two Gaussian Processes respectively over rows and columns. The resulting joint distribution over the observed matrix does not factorize over entries, rows, or columns, and can thus capture intricate dependencies in the matrix. Exact inference in PMA is possible, but involves inversion of large matrices, and can be computationally prohibitive. Efficient approximate inference is possible due to the sparse dependency structure among latent variables. We propose two families of approximate inference algorithms for PMA based on Gibbs sampling and MAP inference. We demonstrate the effectiveness of PMA for missing value prediction and multi-label classification problems.
  • SampleRank: Training Factor Graphs with Atomic Gradients Authors: Michael Wick; Khashayar Rohanimanesh; Kedar Bellare; Aron Culotta; Andrew McCallum
    We present SampleRank, an alternative to contrastive divergence (CD) for estimating parameters in complex graphical models. SampleRank harnesses a user-provided loss function to distribute stochastic gradients across an MCMC chain. As a result, parameter updates can be computed between arbitrary MCMC states. SampleRank is not only faster than CD, but also achieves better accuracy in practice (up to 23\% error reduction on noun-phrase coreference).
  • A New Bayesian Rating System for Team Competitions Authors: Sergey Nikolenko; Alexander Sirotkin
    We present a novel probabilistic rating system for team competitions. Building upon TrueSkill(tm), we change the factor graph structure to cope with the problems of TrueSkill(tm), e.g., multiway ties and variable team size. We give detailed inference algorithms for the new structure. Experimental results show a significant improvement over TrueSkill(tm).
  • Bayesian Learning via Stochastic Gradient Langevin Dynamics Authors: Max Welling; Yee Whye Teh
    In this paper we propose a new framework for learning from large scale datasets based on iterative learning from small mini-batches. By adding the right amount of noise to a standard stochastic gradient optimization algorithm we show that the iterates will converge to samples from the true posterior distribution as we anneal the stepsize. This seamless transition between optimization and Bayesian posterior sampling provides an in-built protection against overfitting. We also propose a practical method for Monte Carlo estimates of posterior statistics which monitors a ``sampling threshold'' and collects samples after it has been surpassed. We apply the method to three models: a mixture of Gaussians, logistic regression and ICA with natural gradients.
  • ABC-EP: Expectation Propagation for Likelihood-free Bayesian Computation Authors: Simon Barthelmé; Nicolas Chopin
    Many statistical models of interest to the natural and social sciences have no tractable likelihood function. Until recently, Bayesian inference for such models was thought infeasible. Pritchard et al. (1999) introduced an algorithm known as ABC, for Approximate Bayesian Computation, that enables Bayesian computation in such models. Despite steady progress since this first breakthrough, such as the adaptation of MCMC and Sequential Monte Carlo techniques to likelihood-free inference, state-of-the art methods remain notoriously hard to use and require enormous computation times. Among other issues, one faces the difficult task of finding appropriate summary statistics for the model, and tuning the algorithm can be time-consuming when little prior information is available. We show that Expectation Propagation, a widely successful approximate inference technique, can be adapted to the likelihood-free context. The resulting algorithm does not require summary statistics, is an order of magnitude faster than existing techniques, and remains usable when prior information is vague.