TechTalks from event: CVPR 2014 Video Spotlights

Posters 4B : 3D Vision, Document Analysis, Optimization Methods, Shape, Vision for Graphics, Web & Vision Systems

  • Exploiting Shading Cues in Kinect IR Images for Geometry Refinement Authors: Gyeongmin Choe, Jaesik Park, Yu-Wing Tai, In So Kweon
    In this paper, we propose a method to refine geometry of 3D meshes from the Kinect fusion by exploiting shading cues captured from the infrared (IR) camera of Kinect. A major benefit of using the Kinect IR camera instead of a RGB camera is that the IR images captured by Kinect are narrow band images which filtered out most undesired ambient light that makes our system robust to natural indoor illumination. We define a near light IR shading model which describes the captured intensity as a function of surface normals, albedo, lighting direction, and distance between a light source and surface points. To resolve ambiguity in our model between normals and distance, we utilize an initial 3D mesh from the Kinect fusion and multi-view information to reliably estimate surface details that were not reconstructed by the Kinect fusion. Our approach directly operates on a 3D mesh model for geometry refinement. The effectiveness of our approach is demonstrated through several challenging real-world examples.
  • Turning Mobile Phones into 3D Scanners Authors: Kalin Kolev, Petri Tanskanen, Pablo Speciale, Marc Pollefeys
    In this paper, we propose an efficient and accurate scheme for the integration of multiple stereo-based depth measurements. For each provided depth map a confidence-based weight is assigned to each depth estimate by evaluating local geometry orientation, underlying camera setting and photometric evidence. Subsequently, all hypotheses are fused together into a compact and consistent 3D model. Thereby, visibility conflicts are identified and resolved, and fitting measurements are averaged with regard to their confidence scores. The individual stages of the proposed approach are validated by comparing it to two alternative techniques which rely on a conceptually different fusion scheme and a different confidence inference, respectively. Pursuing live 3D reconstruction on mobile devices as a primary goal, we demonstrate that the developed method can easily be integrated into a system for monocular interactive 3D modeling by substantially improving its accuracy while adding a negligible overhead to its performance and retaining its interactive potential.
  • Generalized Pupil-Centric Imaging and Analytical Calibration for a Non-frontal Camera Authors: Avinash Kumar, Narendra Ahuja
    We consider the problem of calibrating a small field of view central perspective non-frontal camera whose lens and sensor planes may not be parallel to each other. This can be due to manufacturing defects or intentional tilting. Thus, as such all cameras can be modeled as being non-frontal with varying degrees. There are two approaches to model non- frontal cameras. The first one based on rotation parameter- ization of sensor non-frontalness/tilt increases the number of calibration parameters, thus requiring heuristics to ini- tialize a few calibration parameters for the final non-linear optimization step. Additionally, for this parameterization, while it has been shown that pupil-centric imaging model leads to more accurate rotation estimates than a thin-lens imaging model, it has only been developed for a single axis lens-sensor tilt. But, in real cameras we can have arbitrary tilt. The second approach based on decentering distortion modeling is approximate as it can only handle small tilts and cannot explicitly estimate the sensor tilt. In this paper, we focus on rotation based non-frontal camera calibration and address the aforementioned prob- lems of over-parameterization and inadequacy of existing pupil-centric imaging model. We first derive a generalized pupil-centric imaging model for arbitrary axis lens-sensor tilt. We then derive an analytical solution, in this setting, for a subset of calibration parameters including sensor rotation angles as a function of center of radial distortion (CoD). A radial alignment based constraint is then proposed to com- putationally estimate CoD leveraging on the proposed an- alytical solution. Our analytical technique also estimates pupil-centric parameters of entrance pupil location and op- tical focal length, which have typically been done opti- cally. Given these analytical and computational calibration parameter estimates, we initialize the non-linear calibra- tion optimization for a set of synthetic and real data cap- tured from a non-frontal camera and show reduced pixel re-projection and undistortion errors compared to state of the art techniques in rotation and decentering based ap- proaches to non-frontal camera calibration.
  • 3D Modeling from Wide Baseline Range Scans using Contour Coherence Authors: Ruizhe Wang, Jongmoo Choi, G
    Registering 2 or more range scans is a fundamental problem, with application to 3D modeling. While this problem is well addressed by existing techniques such as ICP when the views overlap significantly at a good initialization, no satisfactory solution exists for wide baseline registration. We propose here a novel approach which leverages contour coherence and allows us to align two wide baseline range scans with limited overlap from a poor initialization. Inspired by ICP, we maximize the contour coherence by building robust corresponding pairs on apparent contours and minimizing their distances in an iterative fashion. We use the contour coherence under a multi-view rigid registration framework, and this enables the reconstruction of accurate and complete 3D models from as few as 4 frames. We further extend it to handle articulations, and this allows us to model articulated objects such as human body. Experimental results on both synthetic and real data demonstrate the effectiveness and robustness of our contour coherence based registration approach to wide baseline range scans, and to 3D modeling.
  • Orientation Robust Text Line Detection in Natural Images Authors: Le Kang, Yi Li, David, Doermann
    In this paper, higher-order correlation clustering (HOCC) is used for text line detection in natural images. We treat text line detection as a graph partitioning problem, where each vertex is represented by a Maximally Stable Extremal Region (MSER). First, weak hypothesises are proposed by coarsely grouping MSERs based on their spatial alignment and appearance consistency. Then, higher-order correlation clustering (HOCC) is used to partition the MSERs into text line candidates, using the hypotheses as soft constraints to enforce long range interactions. We further propose a regularization method to solve the Semidefinite Programming problem in the inference. Finally we use a simple texton-based texture classifier to filter out the non-text areas. This framework allows us to naturally handle multiple orientations, languages and fonts. Experiments show that our approach achieves competitive performance compared to the state of the art.
  • Region-based Discriminative Feature Pooling for Scene Text Recognition Authors: Chen-Yu Lee, Anurag Bhardwaj, Wei Di, Vignesh Jagadeesh, Robinson Piramuthu
    We present a new feature representation method for scene text recognition problem, particularly focusing on improving scene character recognition. Many existing methods rely on Histogram of Oriented Gradient (HOG) or part-based models, which do not span the feature space well for characters in natural scene images, especially given large variation in fonts with cluttered backgrounds. In this work, we propose a discriminative feature pooling method that automatically learns the most informative sub-regions of each scene character within a multi-class classification framework, whereas each sub-region seamlessly integrates a set of low-level image features through integral images. The proposed feature representation is compact, computationally efficient, and able to effectively model distinctive spatial structures of each individual character class. Extensive experiments conducted on challenging datasets (Chars74K, ICDAR'03, ICDAR'11, SVT) show that our method significantly outperforms existing methods on scene character classification and scene text recognition tasks.
  • Quality-based Multimodal Classification using Tree-Structured Sparsity Authors: Soheil Bahrampour, Asok Ray, Nasser M. Nasrabadi, Kenneth W. Jenkins
    Recent studies have demonstrated advantages of information fusion based on sparsity models for multimodal classification. Among several sparsity models, tree-structured sparsity provides a flexible framework for extraction of cross-correlated information from different sources and for enforcing group sparsity at multiple granularities. However, the existing algorithm only solves an approximated version of the cost functional and the resulting solution is not necessarily sparse at group levels. This paper reformulates the tree-structured sparse model for multimodal classification task. An accelerated proximal algorithm is proposed to solve the optimization problem, which is an efficient tool for feature-level fusion among either homogeneous or heterogeneous sources of information. In addition, a (fuzzy-set-theoretic) possibilistic scheme is proposed to weight the available modalities, based on their respective reliability, in a joint optimization problem for finding the sparsity codes. This approach provides a general framework for quality-based fusion that offers added robustness to several sparsity-based multimodal classification algorithms. To demonstrate their efficacy, the proposed methods are evaluated on three different applications - multiview face recognition, multimodal face recognition, and target classification.
  • Newton Greedy Pursuit: A Quadratic Approximation Method for Sparsity-Constrained Optimization Authors: Xiao-Tong Yuan, Qingshan Liu
    First-order greedy selection algorithms have been widely applied to sparsity-constrained optimization. The main theme of this type of methods is to evaluate the function gradient in the previous iteration to update the non-zero entries and their values in the next iteration. In contrast, relatively less effort has been made to study the second-order greedy selection method additionally utilizing the Hessian information. Inspired by the classic constrained Newton method, we propose in this paper the NewTon Greedy Pursuit (NTGP) method to approximately minimizes a twice differentiable function over sparsity constraint. At each iteration, NTGP constructs a second-order Taylor expansion to approximate the cost function, and estimates the next iterate as the solution of the constructed quadratic model over sparsity constraint. Parameter estimation error and convergence property of NTGP are analyzed. The superiority of NTGP to several representative first-order greedy selection methods is demonstrated in synthetic and real sparse logistic regression tasks.
  • Noising versus Smoothing for Vertex Identification in Unknown Shapes Authors: Konstantinos A. Raftopoulos, Marin Ferecatu
    A method for identifying shape features of local nature on the shape's boundary, in a way that is facilitated by the presence of noise is presented. The boundary is seen as a real function. A study of a certain distance function reveals, almost counter-intuitively, that vertices can be defined and localized better in the presence of noise, thus the concept of noising, as opposed to smoothing, is conceived and presented. The method works on both smooth and noisy shapes, the presence of noise having an effect of improving on the results of the smoothed version. Experiments with noise and a comparison to state of the art validate the method.
  • Dense Non-Rigid Shape Correspondence using Random Forests Authors: Emanuele Rodol
    We propose a shape matching method that produces dense correspondences tuned to a specific class of shapes and deformations. In a scenario where this class is represented by a small set of example shapes, the proposed method learns a shape descriptor capturing the variability of the deformations in the given class. The approach enables the wave kernel signature to extend the class of recognized deformations from near isometries to the deformations appearing in the example set by means of a random forest classifier. With the help of the introduced spatial regularization, the proposed method achieves significant improvements over the baseline approach and obtains state-of-the-art results while keeping short computation times.
  • Covariance Descriptors for 3D Shape Matching and Retrieval Authors: Hedi Tabia, Hamid Laga, David Picard, Philippe-Henri Gosselin
    Several descriptors have been proposed in the past for 3D shape analysis, yet none of them achieves best performance on all shape classes. In this paper we propose a novel method for 3D shape analysis using the covariance matrices of the descriptors rather than the descriptors themselves. Covariance matrices enable efficient fusion of different types of features and modalities. They capture, using the same representation, not only the geometric and the spatial properties of a shape region but also the correlation of these properties within the region. Covariance matrices, however, lie on the manifold of Symmetric Positive Definite (SPD) tensors, a special type of Riemannian manifolds, which makes comparison and clustering of such matrices challenging. In this paper we study covariance matrices in their native space and make use of geodesic distances on the manifold as a dissimilarity measure. We demonstrate the performance of this metric on 3D face matching and recognition tasks. We then generalize the Bag of Features paradigm, originally designed in Euclidean spaces, to the Riemannian manifold of SPD matrices. We propose a new clustering procedure that takes into account the geometry of the Riemannian manifold. We evaluate the performance of the proposed Bag of Covariance Matrices framework on 3D shape matching and retrieval applications and demonstrate its superiority compared to descriptor-based techniques.
  • Symmetry-Aware Nonrigid Matching of Incomplete 3D Surfaces Authors: Yusuke Yoshiyasu, Eiichi Yoshida, Kazuhito Yokoi, Ryusuke Sagawa
    We present a nonrigid shape matching technique for establishing correspondences of incomplete 3D surfaces that exhibit intrinsic reflectional symmetry. The key for solving the symmetry ambiguity problem is to use a point-wise local mesh descriptor that has orientation and is thus sensitive to local reflectional symmetry, e.g. discriminating the left hand and the right hand. We devise a way to compute the descriptor orientation by taking the gradients of a scalar field called the average diffusion distance (ADD). Because ADD is smoothly defined on a surface, invariant under isometry/scale and robust to topological errors, the robustness of the descriptor to non-rigid deformations is improved. In addition, we propose a graph matching algorithm called iterative spectral relaxation which combines spectral embedding and spectral graph matching. This formulation allows us to define pairwise constraints in a scale-invariant manner from k-nearest neighbor local pairs such that non-isometric deformations can be robustly handled. Experimental results show that our method can match challenging surfaces with global intrinsic symmetry, data incompleteness and non-isometric deformations.
  • An Automated Estimator of Image Visual Realism Based on Human Cognition Authors: Shaojing Fan, Tian-Tsong Ng, Jonathan S. Herberg, Bryan L. Koenig, Cheston Y.-C. Tan, Rangding Wang
    Assessing the visual realism of images is increasingly becoming an essential aspect of fields ranging from computer graphics (CG) rendering to photo manipulation. In this paper we systematically evaluate factors underlying human perception of visual realism and use that information to create an automated assessment of visual realism. We make the following unique contributions. First, we established a benchmark dataset of images with empirically determined visual realism scores. Second, we identified attributes potentially related to image realism, and used correlational techniques to determine that realism was most related to image naturalness, familiarity, aesthetics, and semantics. Third, we created an attributes-motivated, automated computational model that estimated image visual realism quantitatively. Using human assessment as a benchmark, the model was below human performance, but outperformed other state-of-the-art algorithms.
  • SteadyFlow: Spatially Smooth Optical Flow for Video Stabilization Authors: Shuaicheng Liu, Lu Yuan, Ping Tan, Jian Sun
    We propose a novel motion model, SteadyFlow, to represent the motion between neighboring video frames for stabilization. A SteadyFlow is a specific optical flow by enforcing strong spatial coherence, such that smoothing feature trajectories can be replaced by smoothing pixel profiles, which are motion vectors collected at the same pixel location in the SteadyFlow over time. In this way, we can avoid brittle feature tracking in a video stabilization system. Besides, SteadyFlow is a more general 2D motion model which can deal with spatially-variant motion. We initialize the SteadyFlow by optical flow and then discard discontinuous motions by a spatial-temporal analysis and fill in missing regions by motion completion. Our experiments demonstrate the effectiveness of our stabilization on real-world challenging videos.
  • Joint Summarization of Large-scale Collections of Web Images and Videos for Storyline Reconstruction Authors: Gunhee Kim, Leonid Sigal, Eric P. Xing
    In this paper, we address the problem of jointly summarizing large sets of Flickr images and YouTube videos. Starting from the intuition that the characteristics of the two media types are different yet complementary, we develop a fast and easily-parallelizable approach for creating not only high-quality video summaries but also novel structural summaries of online images as storyline graphs. The storyline graphs can illustrate various events or activities associated with the topic in a form of a branching network. The video summarization is achieved by diversity ranking on the similarity graphs between images and video frames. The reconstruction of storyline graphs is formulated as the inference of sparse time-varying directed graphs from a set of photo streams with assistance of videos. For evaluation, we collect the datasets of 20 outdoor activities, consisting of 2.7M Flickr images and 16K YouTube videos. Due to the large-scale nature of our problem, we evaluate our algorithm via crowdsourcing using Amazon Mechanical Turk. In our experiments, we demonstrate that the proposed joint summarization approach outperforms other baselines and our own methods using videos or images only.
  • Beyond Human Opinion Scores: Blind Image Quality Assessment based on Synthetic Scores Authors: Peng Ye, Jayant Kumar, David Doermann
    State-of-the-art general purpose Blind Image Quality Assessment (BIQA) models rely on examples of distorted images and corresponding human opinion scores to learn a regression function that maps image features to a quality score. These types of models are considered "opinion-aware" (OA) BIQA models. A large set of human scored training examples is usually required to train a reliable OA-BIQA model. However, obtaining human opinion scores through subjective testing is often expensive and time-consuming. It is therefore desirable to develop "opinion-free" (OF) BIQA models that do not require human opinion scores for training. This paper proposes BLISS (Blind Learning of Image Quality using Synthetic Scores). BLISS is a simple, yet effective method for extending OA-BIQA models to OF-BIQA models. Instead of training on human opinion scores, we propose to train BIQA models on synthetic scores derived from Full-Reference (FR) IQA measures. State-of-the-art FR measures yield high correlation with human opinion scores and can serve as approximations to human opinion scores. Unsupervised rank aggregation is applied to combine different FR measures to generate a synthetic score, which serves as a better "gold standard". Extensive experiments on standard IQA datasets show that BLISS significantly outperforms previous OF-BIQA methods and is comparable to state-of-the-art OA-BIQA methods.
  • Active Sampling for Subjective Image Quality Assessment Authors: Peng Ye, David Doermann
    Subjective Image Quality Assessment (IQA) is the most reliable way to evaluate the visual quality of digital images perceived by the end user. It is often used to construct image quality datasets and provide the groundtruth for building and evaluating objective quality measures. Subjective tests based on the Mean Opinion Score (MOS) have been widely used in previous studies, but have many known problems such as an ambiguous scale definition and dissimilar interpretations of the scale among subjects. To overcome these limitations, Paired Comparison (PC) tests have been proposed as an alternative and are expected to yield more reliable results. However, PC tests can be expensive and time consuming, since for $n$ images they require ${n \choose 2}$ comparisons. We present a hybrid subjective test which combines MOS and PC tests via a unified probabilistic model and an active sampling method. The proposed method actively constructs a set of queries consisting of MOS and PC tests based on the expected information gain provided by each test and can effectively reduce the number of tests required for achieving a target accuracy. Our method can be used in conventional laboratory studies as well as crowdsourcing experiments. Experimental results show the proposed method outperforms state-of-the-art subjective IQA tests in a crowdsourced setting.
  • Remote Heart Rate Measurement From Face Videos Under Realistic Situations Authors: Xiaobai Li, Jie Chen, Guoying Zhao, Matti Pietik
    Heart rate is an important indicator of people's physiological state. Recently, several papers reported methods to measure heart rate remotely from face videos. Those methods work well on stationary subjects under well controlled conditions, but their performance significantly degrades if the videos are recorded under more challenging conditions, specifically when subjects' motions and illumination variations are involved. We propose a framework which utilizes face tracking and Normalized Least Mean Square adaptive filtering methods to counter their influences. We test our framework on a large difficult and public database MAHNOB-HCI and demonstrate that our method substantially outperforms all previous methods. We also use our method for long term heart rate monitoring in a game evaluation scenario and achieve promising results.
  • 6 Seconds of Sound and Vision: Creativity in Micro-Videos Authors: Miriam Redi, Neil O'Hare, Rossano Schifanella, Michele Trevisiol, Alejandro Jaimes
    The notion of creativity, as opposed to related concepts such as beauty or interestingness, has not been studied from the perspective of automatic analysis of multimedia content. Meanwhile, short online videos shared
  • GPS-Tag Refinement using Random Walks with an Adaptive Damping Factor Authors: Amir Roshan Zamir, Shervin Ardeshir, Mubarak Shah
  • 3D Reconstruction from Accidental Motion Authors: Fisher Yu, David Gallup
    We have discovered that 3D reconstruction can be achieved from asingle still photographic capture due to accidental motions of thephotographer, even while attempting to hold the camera still. Although these motions result in little baseline and therefore high depth uncertainty, in theory, we can combine many such measurements over the duration of the capture process (a few seconds) to achieve usable depth estimates. Wepresent a novel 3D reconstruction system tailored for this problemthat produces depth maps from short video sequences from standard cameraswithout the need for multi-lens optics, active sensors, or intentionalmotions by the photographer. This result leads to the possibilitythat depth maps of sufficient quality for RGB-D photography applications likeperspective change, simulated aperture, and object segmentation, cancome "for free" for a significant fraction of still photographsunder reasonable conditions.
  • Is Rotation a Nuisance in Shape Recognition? Authors: Qiuhong Ke, Yi Li
    Rotation in closed contour recognition is a puzzling nuisance in most algorithms. In this paper we address three fundamental issues brought by rotation in shapes: 1) is alignment among shapes necessary? If the answer is ``no'', 2) how to exploit information in different rotations? and 3) how to use rotation unaware local features for rotation aware shape recognition? We argue that the origin of these issues is the use of hand crafted rotation-unfriendly features and measurements. Therefore our goal is to learn a set of hierarchical features that describe all rotated versions of a shape as a class, with the capability of distinguishing different such classes. We propose to rotate shapes as many times as possible as training samples, and learn the hierarchical feature representation by effectively adopting a convolutional neural network. We further show that our method is very efficient because the network responses of all possible shifted versions of the same shape can be computed effectively by re-using information in the overlapping areas. We tested the algorithm on three real datasets: Swedish Leaves dataset, ETH-80 Shape, and a subset of the recently collected Leafsnap dataset. Our approach used the curvature scale space and outperformed the state of the art.
  • Ground Plane Estimation using a Hidden Markov Model Authors: Ralf Dragon, Luc Van Gool
    We focus on the problem of estimating the ground plane orientation and location in monocular video sequences from a moving observer. Our only assumptions are that the 3D ego motion t and the ground plane normal n are orthogonal, and that n and t are smooth over time. We formulate the problem as a state-continuous Hidden Markov Model (HMM) where the hidden state contains t and n and may be estimated by sampling and decomposing homographies. We show that using blocked Gibbs sampling, we can infer the hidden state with high robustness towards outliers, drifting trajectories, rolling shutter and an imprecise intrinsic calibration. Since our approach does not need any initial orientation prior, it works for arbitrary camera orientations in which the ground is visible.