TechTalks from event: ICML 2011

Neural Networks and Deep Learning

  • Learning attentional policies for tracking and recognition in video with deep networks Authors: Loris Bazzani; Nando Freitas; Hugo Larochelle; Vittorio Murino; Jo-Anne Ting
    We propose a novel attentional model for simultaneous object tracking and recognition that is driven by gaze data. Motivated by theories of the human perceptual system, the model consists of two interacting pathways: ventral and dorsal. The ventral pathway models object appearance and classification using deep (factored)-restricted Boltzmann machines. At each point in time, the observations consist of retinal images, with decaying resolution toward the periphery of the gaze. The dorsal pathway models the location, orientation, scale and speed of the attended object. The posterior distribution of these states is estimated with particle filtering. Deeper in the dorsal pathway, we encounter an attentional mechanism that learns to control gazes so as to minimize tracking uncertainty. The approach is modular (with each module easily replaceable with more sophisticated algorithms), straightforward to implement, practically efficient, and works well in simple video sequences.
  • Learning Deep Energy Models Authors: Jiquan Ngiam; Zhenghao Chen; Pang Wei Koh; Andrew Ng
    Deep generative models with multiple hidden layers have been shown to be able to learn meaningful and compact representations of data. In this work we propose deep energy models, a class of models that use a deep feedforward neural network to model the energy landscape that defines a probabilistic model. We are able to efficiently train all layers of our model at the same time, allowing the lower layers of the model to adapt to the training of the higher layers, producing better generative models. We evaluate the generative performance of our models on natural images and demonstrate that joint training of multiple layers yields qualitative and quantitative improvements over greedy layerwise training. We further generalize our models beyond the commonly used sigmoidal neural networks and show how a deep extension of the product of Student-t distributions model achieves good generative performance. Finally, we introduce a discriminative extension of our model and demonstrate that it outperforms other fully-connected models on object recognition on the NORB dataset.
  • Unsupervised Models of Images by Spike-and-Slab RBMs Authors: Aarron Courville; James Bergstra; Yoshua Bengio
    The spike and slab Restricted Boltzmann Machine (RBM) is defined by having both a real valued ``slab'' variable and a binary ``spike'' variable associated with each unit in the hidden layer. In this paper we generalize and extend the spike and slab RBM to include non-zero means of the conditional distribution over the observed variables conditional on the binary spike variables. We also introduce a term, quadratic in the observed data that we exploit to guarantee the all conditionals associated with the model are well defined -- a guarantee that was absent in the original spike and slab RBM. The inclusion of these generalizations improves the performance of the spike and slab RBM as a feature learner and achieves competitive performance on the CIFAR-10 image classification task. The spike and slab model, when trained in a convolutional configuration, can generate sensible samples that demonstrate that the model has capture the broad statistical structure of natural images.
  • On Autoencoders and Score Matching for Energy Based Models Authors: Kevin Swersky; Marc'Aurelio Ranzato; David Buchman; Benjamin Marlin; Nando Freitas
    We consider estimation methods for the class of continuous-data energy based models (EBMs). Our main result shows that estimating the parameters of an EBM using score matching when the conditional distribution over the visible units is Gaussian corresponds to training a particular form of regularized autoencoder. We show how different Gaussian EBMs lead to different autoencoder architectures, providing deep links between these two families of models. We compare the score matching estimator for the mPoT model, a particular Gaussian EBM, to several other training methods on a variety of tasks including image denoising and unsupervised feature extraction. We show that the regularization function induced by score matching leads to superior classification performance relative to a standard autoencoder. We also show that score matching yields classification results that are indistinguishable from better-known stochastic approximation maximum likelihood estimators.

Latent-Variable Models

  • Topic Modeling with Nonparametric Markov Tree Authors: Haojun Chen; David Dunson; Lawrence Carin
    A new hierarchical tree-based topic model is developed, based on nonparametric Bayesian techniques. The model has two unique attributes: (i) a child node in the tree may have more than one parent, with the goal of eliminating redundant sub-topics deep in the tree; and (ii) parsimonious sub-topics are manifested, by removing redundant usage of words at multiple scales. The depth and width of the tree are unbounded within the prior, with a retrospective sampler employed to adaptively infer the appropriate tree size based upon the corpus under study. Excellent quantitative results are manifested on five standard data sets, and the inferred tree structure is also found to be highly interpretable.
  • Infinite SVM: a Dirichlet Process Mixture of Large-margin Kernel Machines Authors: Jun Zhu; Ning Chen; Eric Xing
    We present Infinite SVM (iSVM), a Dirichlet process mixture of large-margin kernel machines for multi-way classification. An iSVM enjoys the advantages of both Bayesian nonparametrics in handling the unknown number of mixing components, and large-margin kernel machines in robustly capturing local nonlinearity of complex data. We develop an efficient variational learning algorithm for posterior inference of iSVM, and we demonstrate the advantages of iSVM over Dirichlet process mixture of generalized linear models and other benchmarks on both synthetic and real Flickr image classification datasets.
  • Piecewise Bounds for Estimating Bernoulli-Logistic Latent Gaussian Models Authors: Benjamin Marlin*, University of British Columbia; Mohammad Khan, University of British Columbia; Kevin Murphy, University of Br
    Bernoulli-logistic latent Gaussian models (bLGMs) are a useful model class, but accurate parameter estimation is complicated by the fact that the marginal likelihood contains an intractable logistic-Gaussian integral. In this work, we propose the use of fixed piecewise linear and quadratic upper bounds to the logistic-log-partition (LLP) function as a way of circumventing this intractable integral. We describe a framework for approximately computing minimax optimal piecewise quadratic bounds, as well a generalized expectation maximization algorithm based on using piecewise bounds to estimate bLGMs. We prove a theoretical result relating the maximum error in the LLP bound to the maximum error in the marginal likelihood estimate. Finally, we present empirical results showing that piecewise bounds can be significantly more accurate than previously proposed variational bounds.
  • A Spectral Algorithm for Latent Tree Graphical Models Authors: Ankur Parikh; Le Song; Eric Xing
    Latent variable models are powerful tools for probabilistic modeling, and have been successfully applied to various domains, such as speech analysis and bioinformatics. However, parameter learning algorithms for latent variable models have predominantly relied on local search heuristics such as expectation maximization (EM). We propose a fast, local-minimum-free spectral algorithm for learning latent variable models with arbitrary tree topologies, and show that the joint distribution of the observed variables can be reconstructed from the marginals of triples of observed variables irrespective of the maximum degree of the tree. We demonstrate the performance of our spectral algorithm on synthetic and real datasets; for large training sizes, our algorithm performs comparable to or better than EM while being orders of magnitude faster.

Active and Online Learning

  • Speeding-Up Hoeffding-Based Regression Trees With Options Authors: Elena Ikonomovska; João Gama; Bernard Zenko; Saso Dzeroski
    Data streams are ubiquitous and have in the last two decades become an important research topic. For their predictive non-parametric analysis, Hoeffding-based trees are often a method of choice, offering a possibility of any-time predictions. However, one of their main problems is the delay in learning progress due to the existence of equally discriminative attributes. Options are a natural way to deal with this problem. Option trees build upon regular trees by adding splitting options in the internal nodes. As such they are known to improve accuracy, stability and reduce ambiguity. In this paper, we present on-line option trees for faster learning on numerical data streams. Our results show that options improve the any-time performance of ordinary on-line regression trees, while preserving the interpretable structure of trees and without significantly increasing the computational complexity of the algorithm.
  • Adaptively Learning the Crowd Kernel Authors: Omer Tamuz; Ce Liu; Serge Belongie; Ohad Shamir; Adam Kalai
    We introduce an algorithm that, given n objects, learns a similarity matrix over all n^2 pairs, from crowdsourced data *alone*. The algorithm samples responses to adaptively chosen triplet-based relative-similarity queries. Each query has the form "is object a more similar to b or to c?" and is chosen to be maximally informative given the preceding responses. The output is an embedding of the objects into Euclidean space (like MDS); we refer to this as the "crowd kernel." SVMs reveal that the crowd kernel captures prominent and subtle features across a number of domains, such as "is striped" among neckties and "vowel vs. consonant" among letters.
  • Bundle Selling by Online Estimation of Valuation Functions Authors: Daniel Vainsencher; Ofer Dekel; Shie Mannor
    We consider the problem of online selection of a bundle of items when the cost of each item changes arbitrarily from round to round and the valuation function is initially unknown and revealed only through the noisy values of selected bundles (the bandit feedback setting). We are interested in learning schemes that have a small regret compared to an agent who knows the true valuation function. Since there are exponentially many bundles, further assumptions on the valuation functions are needed. We make the assumption that the valuation function is supermodular and has non-linear interactions that are of low degree in a novel sense. We develop efficient learning algorithms that balance exploration and exploitation to achieve low regret in this setting.
  • Active Learning from Crowds Authors: Yan Yan; Romer Rosales; Glenn Fung; Jennifer Dy
    Obtaining labels is expensive or time-consuming, but unlabeled data is often abundant and easy to obtain. Many learning task can profit from intelligently choosing unlabeled instances to be labeled by an oracle also known as active learning, instead of simply labeling all the data or randomly selecting data to be labeled. Supervised learning traditionally relies on an oracle playing the role of a teacher. In the multiple annotator paradigm, an oracle, who knows the ground truth, no longer exists; instead, multiple labelers, with varying expertise, are available for querying. This paradigm posits new challenges to the active learning scenario. We can ask which data sample should be labeled next and which annotator should we query to benefit our learning model the most. In this paper, we develop a probabilistic model for learning from multiple annotators that can also learn the annotator expertise even when their expertise may not be consistently accurate (or inaccurate) across the task domain. In addition, we provide an optimization formulation that allows us to simultaneously learn the most uncertain sample and the annotator/s to query the labels from for active learning. Our active learning approach combines both intelligently selecting samples to label and learning from expertise among multiple labelers to improve learning performance.