TechTalks from event: ICML 2011

Graphical Models and Bayesian Inference

  • Variational Heteroscedastic Gaussian Process Regression Authors: Miguel Lazaro-Gredilla; Michalis Titsias
    Standard Gaussian processes (GPs) model observations' noise as constant throughout input space. This is often a too restrictive assumption, but one that is needed for GP inference to be tractable. In this work we present a non-standard variational approximation that allows accurate inference in heteroscedastic GPs (i.e., under input-dependent noise conditions). Computational cost is roughly twice that of the standard GP, and also scales as O(n^3). Accuracy is verified by comparing with the golden standard MCMC and its effectiveness is illustrated on several synthetic and real datasets of diverse characteristics. An application to volatility forecasting is also considered.
  • Predicting Legislative Roll Calls from Text Authors: Sean Gerrish; David Blei
    We develop several predictive models linking legislative sentiment to legislative text. Our models, which draw on ideas from ideal point estimation and topic models, predict voting patterns based on the contents of bills and infer the political leanings of legislators. With supervised topics, we provide an exploratory window into how the language of the law is correlated with political support. We also derive approximate posterior inference algorithms based on variational methods. Across 12 years of legislative data, we predict specific voting patterns with high accuracy.
  • Bounding the Partition Function using Holder's Inequality Authors: Qiang Liu; Alexander Ihler
    We describe an algorithm for approximate inference in graphical models based on Holder's inequality that provides upper and lower bounds on common summation problems such as computing the partition function or probability of evidence in a graphical model. Our algorithm unifies and extends several existing approaches, including variable elimination techniques such as mini-bucket elimination and variational methods such as tree reweighted belief propagation and conditional entropy decomposition. We show that our method inherits benefits from each approach to provide significantly better bounds on sum-product tasks.
  • On Bayesian PCA: Automatic Dimensionality Selection and Analytic Solution Authors: Shinichi Nakajima; Masashi Sugiyama; Derin Babacan
    In probabilistic PCA, the fully Bayesian estimation is computationally intractable. To cope with this problem, two types of approximation schemes were introduced: the partially Bayesian PCA (PB-PCA) where only the latent variables are integrated out, and the variational Bayesian PCA (VB-PCA) where the loading vectors are also integrated out. The VB-PCA was proposed as an improved variant of PB-PCA for enabling automatic dimensionality selection (ADS). In this paper, we investigate whether VB-PCA is really the best choice from the viewpoints of computational efficiency and ADS. We first show that ADS is not the unique feature of VB-PCA---PB-PCA is also actually equipped with ADS. We further show that PB-PCA is more advantageous in computational efficiency than VB-PCA because the global solution of PB-PCA can be computed analytically. However, we also show the negative fact that PB-PCA results in a trivial solution in the empirical Bayesian framework. We next consider a simplified variant of VB-PCA, where the latent variables and loading vectors are assumed to be mutually independent (while the ordinary VB-PCA only requires matrix-wise independence). We show that this simplified VB-PCA is the most advantageous in practice because its empirical Bayes solution experimentally works as well as the original VB-PCA, and its global optimal solution can be computed efficiently in a closed form.
  • Bayesian CCA via Group Sparsity Authors: Seppo Virtanen; Arto Klami; Samuel Kaski
    Bayesian treatments of Canonical Correlation Analysis (CCA) -type latent variable models have been recently proposed for coping with overfitting in small sample sizes, as well as for producing factorizations of the data sources into correlated and non-shared effects. However, all of the current implementations of Bayesian CCA and its extensions are computationally inefficient for high-dimensional data and, as shown in this paper, break down completely for high-dimensional sources with low sample count. Furthermore, they cannot reliably separate the correlated effects from non-shared ones. We propose a new Bayesian CCA variant that is computationally efficient and works for high-dimensional data, while also learning the factorization more accurately. The improvements are gained by introducing a group sparsity assumption and an improved variational approximation. The method is demonstrated to work well on multi-label prediction tasks and in analyzing brain correlates of naturalistic audio stimulation.