CVPR 2014 Video Spotlights
TechTalks from event: CVPR 2014 Video Spotlights
Orals 1D : Action Recognition
-
Multi-View Super Vector for Action RecognitionImages and videos are often characterized by multiple types of local descriptors such as SIFT, HOG and HOF, each of which describes certain aspects of object feature. Recognition systems benefit from fusing multiple types of these descriptors. Two widely applied fusion pipelines are descriptor concatenation and kernel average. The first one is effective when different descriptors are strongly correlated, while the second one is probably better when descriptors are relatively independent. In practice, however, different descriptors are neither fully independent nor fully correlated, and previous fusion methods may not be satisfying. In this paper, we propose a new global representation, Multi-View Super Vector (MVSV), which is composed of relatively independent components derived from a pair of descriptors. Kernel average is then applied on these components to produce recognition result. To obtain MVSV, we develop a generative mixture model of probabilistic canonical correlation analyzers (M-PCCA), and utilize the hidden factors and gradient vectors of M-PCCA to construct MVSV for video representation. Experiments on video based action recognition tasks show that MVSV achieves promising results, and outperforms FV and VLAD with descriptor concatenation or kernel average fusion strategy.
-
Unsupervised Spectral Dual Assignment Clustering of Human Actions in ContextA recent trend of research has shown how contextual information related to an action, such as a scene or object, can enhance the accuracy of human action recognition systems. However, using context to improve unsupervised human action clustering has never been considered before, and cannot be achieved using existing clustering methods. To solve this problem, we introduce a novel, general purpose algorithm, Dual Assignment k-Means (DAKM), which is uniquely capable of performing two co-occurring clustering tasks simultaneously, while exploiting the correlation information to enhance both clusterings. Furthermore, we describe a spectral extension of DAKM (SDAKM) for better performance on realistic data. Extensive experiments on synthetic data and on three realistic human action datasets with scene context show that DAKM/SDAKM can significantly outperform the state-of-the-art clustering methods by taking into account the contextual relationship between actions and scenes.
- All Sessions
- Orals 1A : Matching & Reconstruction
- Orals 1B : Segmentation & Grouping
- Posters 1A : Recognition, Segmentation, Stereo & SFM
- Orals 1C : Statistical Methods & Learning I
- Orals 1D : Action Recognition
- Posters 1B : 3D Vision, Action Recognition, Recognition, Statistical Methods & Learning
- Orals 2A : Motion & Tracking
- Orals 2B : Discrete Optimization
- Posters 2A : Motion & Tracking, Optimization, Statistical Methods & Learning, Stereo & SFM
- Posters 2B : Face & Gesture, Recognition
- Orals 3A : Physics-Based Vision & Shape-from-X
- Orals 3B : Video: Events, Activities & Surveillance
- Posters 3A : Physics-Based Vision, Recognition, Video: Events, Activities & Surveillance
- Orals 3C : Medical & Biological Image Analysis
- Orals 3D : Low-Level Vision & Image Processing
- Posters 3B : Biologically Inspired Vision, Low-Level Vision, Medical & Biological Image Analysis, Segmentation
- Orals 4A : Computational Photography: Sensing and Display
- Orals 4B : Recognition: Detection, Categorization, Classification
- Posters 4A : Computational Photography, Motion & Tracking, Recognition
- Orals 4C : 3D Geometry & Shape
- Orals 4F : View Synthesis & Other Applications
- Posters 4B : 3D Vision, Document Analysis, Optimization Methods, Shape, Vision for Graphics, Web & Vision Systems
- Orals 2F : Convolutional Neural Networks