TechTalks from event: ICML 2011
Neural Networks and Deep Learning
Learning attentional policies for tracking and recognition in video with deep networksWe propose a novel attentional model for simultaneous object tracking and recognition that is driven by gaze data. Motivated by theories of the human perceptual system, the model consists of two interacting pathways: ventral and dorsal. The ventral pathway models object appearance and classification using deep (factored)-restricted Boltzmann machines. At each point in time, the observations consist of retinal images, with decaying resolution toward the periphery of the gaze. The dorsal pathway models the location, orientation, scale and speed of the attended object. The posterior distribution of these states is estimated with particle filtering. Deeper in the dorsal pathway, we encounter an attentional mechanism that learns to control gazes so as to minimize tracking uncertainty. The approach is modular (with each module easily replaceable with more sophisticated algorithms), straightforward to implement, practically efficient, and works well in simple video sequences.
Learning Deep Energy ModelsDeep generative models with multiple hidden layers have been shown to be able to learn meaningful and compact representations of data. In this work we propose deep energy models, a class of models that use a deep feedforward neural network to model the energy landscape that defines a probabilistic model. We are able to efficiently train all layers of our model at the same time, allowing the lower layers of the model to adapt to the training of the higher layers, producing better generative models. We evaluate the generative performance of our models on natural images and demonstrate that joint training of multiple layers yields qualitative and quantitative improvements over greedy layerwise training. We further generalize our models beyond the commonly used sigmoidal neural networks and show how a deep extension of the product of Student-t distributions model achieves good generative performance. Finally, we introduce a discriminative extension of our model and demonstrate that it outperforms other fully-connected models on object recognition on the NORB dataset.
Unsupervised Models of Images by Spike-and-Slab RBMsThe spike and slab Restricted Boltzmann Machine (RBM) is defined by having both a real valued ``slab'' variable and a binary ``spike'' variable associated with each unit in the hidden layer. In this paper we generalize and extend the spike and slab RBM to include non-zero means of the conditional distribution over the observed variables conditional on the binary spike variables. We also introduce a term, quadratic in the observed data that we exploit to guarantee the all conditionals associated with the model are well defined -- a guarantee that was absent in the original spike and slab RBM. The inclusion of these generalizations improves the performance of the spike and slab RBM as a feature learner and achieves competitive performance on the CIFAR-10 image classification task. The spike and slab model, when trained in a convolutional configuration, can generate sensible samples that demonstrate that the model has capture the broad statistical structure of natural images.
On Autoencoders and Score Matching for Energy Based ModelsWe consider estimation methods for the class of continuous-data energy based models (EBMs). Our main result shows that estimating the parameters of an EBM using score matching when the conditional distribution over the visible units is Gaussian corresponds to training a particular form of regularized autoencoder. We show how different Gaussian EBMs lead to different autoencoder architectures, providing deep links between these two families of models. We compare the score matching estimator for the mPoT model, a particular Gaussian EBM, to several other training methods on a variety of tasks including image denoising and unsupervised feature extraction. We show that the regularization function induced by score matching leads to superior classification performance relative to a standard autoencoder. We also show that score matching yields classification results that are indistinguishable from better-known stochastic approximation maximum likelihood estimators.