TechTalks from event: CVPR 2014 Video Spotlights

Posters 3B : Biologically Inspired Vision, Low-Level Vision, Medical & Biological Image Analysis, Segmentation

  • 100+ Times Faster Weighted Median Filter (WMF) Authors: Qi Zhang, Li Xu, Jiaya Jia
    Weighted median, in the form of either solver or filter, has been employed in a wide range of computer vision solutions for its beneficial properties in sparsity representation. But it is hard to be accelerated due to the spatially varying weight and the median property. We propose a few efficient schemes to reduce computation complexity from O(r2) to O(r) where r is the kernel size. Our contribution is on a new joint-histogram representation, median tracking, and a new data structure that enables fast data access. The effectiveness of these schemes is demonstrated on optical flow estimation, stereo matching, structure-texture separation, image filtering, to name a few. The running time is largely shortened from several minutes to less than 1 second. The source code is provided in the project website.
  • Weighted Nuclear Norm Minimization with Application to Image Denoising Authors: Shuhang Gu, Lei Zhang, Wangmeng Zuo, Xiangchu Feng
    As a convex relaxation of the low rank matrix factorization problem, the nuclear norm minimization has been attracting significant research interest in recent years. The standard nuclear norm minimization regularizes each singular value equally to pursue the convexity of the objective function. However, this greatly restricts its capability and flexibility in dealing with many practical problems (e.g., denoising), where the singular values have clear physical meanings and should be treated differently. In this paper we study the weighted nuclear norm minimization (WNNM) problem, where the singular values are assigned different weights. The solutions of the WNNM problem are analyzed under different weighting conditions. We then apply the proposed WNNM algorithm to image denoising by exploiting the image nonlocal self-similarity. Experimental results clearly show that the proposed WNNM algorithm outperforms many state-of-the-art denoising algorithms such as BM3D in terms of both quantitative measure and visual perception quality.
  • Blind Image Quality Assessment using Semi-supervised Rectifier Networks Authors: Huixuan Tang, Neel Joshi, Ashish Kapoor
    It is often desirable to evaluate images quality with a perceptually relevant measure that does not require a reference image. Recent approaches to this problem use human provided quality scores with machine learning to learn a measure. The biggest hurdles to these efforts are: 1) the difficulty of generalizing across diverse types of distortions and 2) collecting the enormity of human scored training data that is needed to learn the measure. We present a new blind image quality measure that addresses these difficulties by learning a robust, nonlinear kernel regression function using a rectifier neural network. The method is pre-trained with unlabeled data and fine-tuned with labeled data. It generalizes across a large set of images and distortion types without the need for a large amount of labeled data. We evaluate our approach on two benchmark datasets and show that it not only outperforms the current state of the art in blind image quality estimation, but also outperforms the state of the art in non-blind measures. Furthermore, we show that our semi-supervised approach is robust to using varying amounts of labeled data.
  • Joint Depth Estimation and Camera Shake Removal from Single Blurry Image Authors: Zhe Hu, Li Xu, Ming-Hsuan Yang
    Camera shake during exposure time often results in spatially variant blur effect of the image. The non-uniform blur effect is not only caused by the camera motion, but also the depth variation of the scene. The objects close to the camera sensors are likely to appear more blurry than those at a distance in such cases. However, recent non-uniform deblurring methods do not explicitly consider the depth factor or assume fronto-parallel scenes with constant depth for simplicity. While single image non-uniform deblurring is a challenging problem, the blurry results in fact contain depth information which can be exploited. We propose to jointly estimate scene depth and remove non-uniform blur caused by camera motion by exploiting their underlying geometric relationships, with only single blurry image as input. To this end, we present a unified layer-based model for depth-involved deblurring. We provide a novel layer-based solution using matting to partition the layers and an expectation-maximization scheme to solve this problem. This approach largely reduces the number of unknowns and makes the problem tractable. Experiments on challenging examples demonstrate that both depth and camera shake removal can be well addressed within the unified framework.
  • Deblurring Text Images via L0-Regularized Intensity and Gradient Prior Authors: Jinshan Pan, Zhe Hu, Zhixun Su, Ming-Hsuan Yang
    We propose a simple yet effective $L_0$-regularized prior based on intensity and gradient for text image deblurring. The proposed image prior is motivated by observing distinct properties of text images. Based on this prior, we develop an efficient optimization method to generate reliable intermediate results for kernel estimation. The proposed method does not require any complex filtering strategies to select salient edges which are critical to the state-of-the-art deblurring algorithms. We discuss the relationship with other deblurring algorithms based on edge selection and provide insight on how to select salient edges in a more principled way. In the final latent image restoration step, we develop a simple method to remove artifacts and render better deblurred images. Experimental results demonstrate that the proposed algorithm performs favorably against the state-of-the-art text image deblurring methods. In addition, we show that the proposed method can be effectively applied to deblur low-illumination images.
  • CID: Combined Image Denoising in Spatial and Frequency Domains Using Web Images Authors: Huanjing Yue, Xiaoyan Sun, Jingyu Yang, Feng Wu
    In this paper, we propose a novel two-step scheme to filter heavy noise from images with the assistance of retrieved Web images. There are two key technical contributions in our scheme. First, for every noisy image block, we build two three dimensional (3D) data cubes by using similar blocks in retrieved Web images and similar nonlocal blocks within the noisy image, respectively. To better use their correlations, we propose different denoising strategies. The denoising in the 3D cube built upon the retrieved images is performed as median filtering in the spatial domain, whereas the denoising in the other 3D cube is performed in the frequency domain. These two denoising results are then combined in the frequency domain to produce a denoising image. Second, to handle heavy noise, we further propose using the denoising image to improve image registration of the retrieved Web images, 3D cube building, and the estimation of filtering parameters in the frequency domain. Afterwards, the proposed denoising is performed on the noisy image again to generate the final denoising result. Our experimental results show that when the noise is high, the proposed scheme is better than BM3D by more than 2 dB in PSNR and the visual quality improvement is clear to see.
  • Robust 3D Features for Matching between Distorted Range Scans Captured by Moving Systems Authors: Xiangqi Huang, Bo Zheng, Takeshi Masuda, Katsushi Ikeuchi
    Laser range sensors are often demanded to mount on a moving platform for achieving the good efficiency of 3D reconstruction. However, such moving systems often suffer from the difficulty of matching the distorted range scans. In this paper, we propose novel 3D features which can be robustly extracted and matched even for the distorted 3D surface captured by a moving system. Our feature extraction employs Morse theory to construct Morse functions which capture the critical points approximately invariant to the 3D surface distortion. Then for each critical point, we extract support regions with the maximally stable region defined by extremal region or disconnectivity. Our feature description is designed as two steps: 1) we normalize the detected local regions to canonical shapes for robust matching; 2) we encode each key point with multiple vectors at different Morse function values. In experiments, we demonstrate that the proposed 3D features achieve substantially better performance for distorted surface matching than the state-of-the-art methods.
  • Discriminative Blur Detection Features Authors: Jianping Shi, Li Xu, Jiaya Jia
    Ubiquitous image blur brings out a practically important question � what are effective features to differentiate between blurred and unblurred image regions. We address it by studying a few blur feature representations in image gradient, Fourier domain, and data-driven local filters. Unlike previous methods, which are often based on restoration mechanisms, our features are constructed to enhance discriminative power and are adaptive to various blur scales in images. To avail evaluation, we build a new blur perception dataset containing thousands of images with labeled ground-truth. Our results are applied to several applications, including blur region segmentation, deblurring, and blur magnification.
  • Detection, Rectification and Segmentation of Coplanar Repeated Patterns Authors: James Pritts, Ond?ej Chum, Ji?
    This paper presents a novel and general method for the detection, rectification and segmentation of imaged coplanar repeated patterns. The only assumption made of the scene geometry is that repeated scene elements are mapped to each other by planar Euclidean transformations. The class of patterns covered is broad and includes nearly all commonly seen, planar, man-made repeated patterns. In addition, novel linear constraints are used to reduce geometric ambiguity between the rectified imaged pattern and the scene pattern. Rectification to within a similarity of the scene plane is achieved from one rotated repeat, or to within a similarity with a scale ambiguity along the axis of symmetry from one reflected repeat. A stratum of constraints is derived that gives the necessary configuration of repeats for each successive level of rectification. A generative model for the imaged pattern is inferred and used to segment the pattern with pixel accuracy. Qualitative results are shown on a broad range of image types on which state-of-the-art methods fail.
  • Mirror Symmetry Histograms for Capturing Geometric Properties in Images Authors: Marcelo Cicconet, Davi Geiger, Kristin C Gunsalus, Michael Werman
    We propose a data structure that captures global geometric properties in images: Histogram of Mirror Symmetry Coefficients. We compute such a coefficient for every pair of pixels, and group them in a 6-dimensional histogram. By marginalizing the HMSC in various ways, we develop algorithms for a range of applications: detection of nearly-circular cells; location of the main axis of reflection symmetry; detection of cell-division in movies of developing embryos; detection of worm-tips and indirect cell-counting via supervised classification. Our approach generalizes a series of histogram-related methods, and the proposed algorithms perform with state-of-the-art accuracy.
  • A Learning-to-Rank Approach for Image Color Enhancement Authors: Jianzhou Yan, Stephen Lin, Sing Bing Kang, Xiaoou Tang
    We present a machine-learned ranking approach for automatically enhancing the color of a photograph. Unlike previous techniques that train on pairs of images before and after adjustment by a human user, our method takes into account the intermediate steps taken in the enhancement process, which provide detailed information on the person's color preferences. To make use of this data, we formulate the color enhancement task as a learning-to-rank problem in which ordered pairs of images are used for training, and then various color enhancements of a novel input image can be evaluated from their corresponding rank values. From the parallels between the decision tree structures we use for ranking and the decisions made by a human during the editing process, we posit that breaking a full enhancement sequence into individual steps can facilitate training. Our experiments show that this approach compares well to existing methods for automatic color enhancement.
  • Quality Assessment for Comparing Image Enhancement Algorithms Authors: Zhengying Chen, Tingting Jiang, Yonghong Tian
    As the image enhancement algorithms developed in recent years, how to compare the performances of different image enhancement algorithms becomes a novel task. In this paper, we propose a framework to do quality assessment for comparing image enhancement algorithms. Not like traditional image quality assessment approaches, we focus on the relative quality ranking between enhanced images rather than giving an absolute quality score for a single enhanced image. We construct a dataset which contains source images in bad visibility and their enhanced images processed by different enhancement algorithms, and then do subjective assessment in a pair-wise way to get the relative ranking of these enhanced images. A rank function is trained to fit the subjective assessment results, and can be used to predict ranks of new enhanced images which indicate the relative quality of enhancement algorithms. The experimental results show that our proposed approach statistically outperforms state-of-the-art general-purpose NR-IQA algorithms.
  • Shadow Removal from Single RGB-D Images Authors: Yao Xiao, Efstratios Tsougenis, Chi-Keung Tang
    We present the first automatic method to remove shadows from single RGB-D images. Using normal cues directly derived from depth, we can remove hard and soft shadows while preserving surface texture and shading. Our key assumption is: pixels with similar normals, spatial locations and chromaticity should have similar colors. A modified nonlocal matching is used to compute a shadow confidence map that localizes well hard shadow boundary, thus handling hard and soft shadows within the same framework. We compare our results produced using state-of-the-art shadow removal on single RGB images, and intrinsic image decomposition on standard RGB-D datasets.
  • The Synthesizability of Texture Examples Authors: Dengxin Dai, Hayko Riemenschneider, Luc Van Gool
    Example-based texture synthesis (ETS) has been widely used to generate high quality textures of desired sizes from a small example. However, not all textures are equally well reproducible that way. We predict how synthesizable a particular texture is by ETS. We introduce a dataset (21, 302 textures) of which all images have been annotated in terms of their synthesizability. We design a set of texture features, such as 'textureness', homogeneity, repetitiveness, and irregularity, and train a predictor using these features on the data collection. This work is the first attempt to quantify this image property, and we find that texture synthesizability can be learned and predicted. We use this insight to trim images to parts that are more synthesizable. Also we suggest which texture synthesis method is best suited to synthesise a given texture. Our approach can be seen as 'winner-uses-all': picking one method among several alternatives, ending up with an overall superior ETS method. Such strategy could also be considered for other vision tasks: rather than building an even stronger method, choose from existing methods based on some simple preprocessing.
  • Tracking on the Product Manifold of Shape and Orientation for Tractography from Diffusion MRI Authors: Yuanxiang Wang, Hesamoddin Salehian, Guang Cheng, Baba C. Vemuri
    Tractography refers to the process of tracing out the nerve fiber bundles from diffusion Magnetic Resonance Images (dMRI) data acquired either in vivo or ex-vivo. Tractography is a mature research topic within the field of diffusion MRI analysis, nevertheless, several new methods are being proposed on a regular basis thereby justifying the need, as the problem is not fully solved. Tractography is usually applied to the model (used to represent the diffusion MR signal or a derived quantity) reconstructed from the acquired data. Separating shape and orientation of these models was previously shown to approximately preserve diffusion anisotropy (a useful bio-marker) in the ubiquitous problem of interpolation. However, no further intrinsic geometric properties of this framework were exploited to date in literature. In this paper, we propose a new intrinsic recursive filter on the product manifold of shape and orientation. The recursive filter, dubbed IUKFPro, is a generalization of the unscented Kalman filter (UKF) to this product manifold. The salient contributions of this work are: (1) A new intrinsic UKF for the product manifold of shape and orientation. (2) Derivation of the Riemannian geometry of the product manifold. (3) IUKFPro is tested on synthetic and real data sets from various tractography challenge competitions. From the experimental results, it is evident that IUKFPro performs better than several competing schemes in literature with regards to some of the error measures used in the competitions and is competitive with respect to others.
  • Patch-based Evaluation of Image Segmentation Authors: Christian Ledig, Wenzhe Shi, Wenjia Bai, Daniel Rueckert
    The quantification of similarity between image segmentations is a complex yet important task. The ideal similarity measure should be unbiased to segmentations of different volume and complexity, and be able to quantify and visualise segmentation bias. Similarity measures based on overlap, e.g. Dice score, or surface distances, e.g. Hausdorff distance, clearly do not satisfy all of these properties. To address this problem, we introduce Patch-based Evaluation of Image Segmentation (PEIS), a general method to assess segmentation quality. Our method is based on finding patch correspondences and the associated patch displacements, which allow the estimation of segmentation bias. We quantify both the agreement of the segmentation boundary and the conservation of the segmentation shape. We further assess the segmentation complexity within patches to weight the contribution of local segmentation similarity to the global score. We evaluate PEIS on both synthetic data and two medical imaging datasets. On synthetic segmentations of different shapes, we provide evidence that PEIS, in comparison to the Dice score, produces more comparable scores, has increased sensitivity and estimates segmentation bias accurately. On cardiac magnetic resonance (MR) images, we demonstrate that PEIS can evaluate the performance of a segmentation method independent of the size or complexity of the segmentation under consideration. On brain MR images, we compare five different automatic hippocampus segmentation techniques using PEIS. Finally, we visualise the segmentation bias on a selection of the cases.
  • Evaluation of Scan-Line Optimization for 3D Medical Image Registration Authors: Simon Hermann
    Scan-line optimization via cost accumulation has become very popular for stereo estimation in computer vision applications and is often combined with a semi-global cost integration strategy, known as SGM. This paper introduces this combination as a general and effective optimization technique. It is the first time that this concept is applied to 3D medical image registration. The presented algorithm, SGM-3D, employs a coarse-to-fine strategy and reduces the search space dimension for consecutive pyramid levels by a fixed linear rate. This allows it to handle large displacements to an extent that is required for clinical applications in high dimensional data. SGM-3D is evaluated in context of pulmonary motion analysis on the recently extended DIR-lab benchmark that provides ten 4D computed tomography (CT) image data sets, as well as ten challenging 3D CT scan pairs from the COPDgene study archive. Results show that both registration errors as well as run-time performance are very competitive with current state-of-the-art methods.
  • Classification of Histology Sections via Multispectral Convolutional Sparse Coding Authors: Yin Zhou, Hang Chang, Kenneth Barner, Paul Spellman, Bahram Parvin
    Image-based classification of histology sections plays an important role in predicting clinical outcomes. However this task is very challenging due to the presence of large technical variations (e.g., fixation, staining) and biological heterogeneities (e.g., cell type, cell state). In the field of biomedical imaging, for the purposes of visualization and/or quantification, different stains are typically used for different targets of interest (e.g., cellular/subcellular events), which generates multi-spectrum data (images) through various types of microscopes and, as a result, provides the possibility of learning biological-component-specific features by exploiting multispectral information. We propose a multispectral feature learning model that automatically learns a set of convolution filter banks from separate spectra to efficiently discover the intrinsic tissue morphometric signatures, based on convolutional sparse coding (CSC). The learned feature representations are then aggregated through the spatial pyramid matching framework (SPM) and finally classified using a linear SVM. The proposed system has been evaluated using two large-scale tumor cohorts, collected from The Cancer Genome Atlas (TCGA). Experimental results show that the proposed model 1) outperforms systems utilizing sparse coding for unsupervised feature learning (e.g., PSDSPM [5]); 2) is competitive with systems built upon features with biological prior knowledge (e.g., SMLSPM [4]).
  • Discriminative Sparse Inverse Covariance Matrix: Application in Brain Functional Network Classification Authors: Luping Zhou, Lei Wang, Philip Ogunbona
    Recent studies show that mental disorders change the functional organization of the brain, which could be investigated via various imaging techniques. Analyzing such changes is becoming critical as it could provide new biomarkers for diagnosing and monitoring the progression of the diseases. Functional connectivity analysis studies the covary activity of neuronal populations in different brain regions. The sparse inverse covariance estimation (SICE), also known as graphical LASSO, is one of the most important tools for functional connectivity analysis, which estimates the interregional partial correlations of the brain. Although being increasingly used for predicting mental disorders, SICE is basically a generative method that may not necessarily perform well on classifying neuroimaging data. In this paper, we propose a learning framework to effectively improve the discriminative power of SICEs by taking advantage of the samples in the opposite class. We formulate our objective as convex optimization problems for both one-class and two-class classifications. By analyzing these optimization problems, we not only solve them efficiently in their dual form, but also gain insights into this new learning framework. The proposed framework is applied to analyzing the brain metabolic covariant networks built upon FDG-PET images for the prediction of the Alzheimer's disease, and shows significant improvement of classification performance for both one-class and two-class scenarios. Moreover, as SICE is a general method for learning undirected Gaussian graphical models, this paper has broader meanings beyond the scope of brain research.
  • Learning-Based Atlas Selection for Multiple-Atlas Segmentation Authors: Gerard Sanroma, Guorong Wu, Yaozong Gao, Dinggang Shen
    Recently, multi-atlas segmentation (MAS) has achieved a great success in the medical imaging area. The key assumption of MAS is that multiple atlases encompass richer anatomical variability than a single atlas. Therefore, we can label the target image more accurately by mapping the label information from the appropriate atlas images that have the most similar structures. The problem of atlas selection, however, still remains unexplored. Current state-of-the-art MAS methods rely on image similarity to select a set of atlases. Unfortunately, this heuristic criterion is not necessarily related to segmentation performance and, thus may undermine segmentation results. To solve this simple but critical problem, we propose a learning-based atlas selection method to pick up the best atlases that would eventually lead to more accurate image segmentation. Our idea is to learn the relationship between the pairwise appearance of observed instances (a pair of atlas and target images) and their final labeling performance (in terms of Dice ratio). In this way, we can select the best atlases according to their expected labeling accuracy. It is worth noting that our atlas selection method is general enough to be integrated with existing MAS methods. As is shown in the experiments, we achieve significant improvement after we integrate our method with 3 widely used MAS methods on ADNI and LONI LPBA40 datasets.
  • Object-based Multiple Foreground Video Co-segmentation Authors: Huazhu Fu, Dong Xu, Bao Zhang, Stephen Lin
    We present a video co-segmentation method that uses category-independent object proposals as its basic element and can extract multiple foreground objects in a video set. The use of object elements overcomes limitations of low-level feature representations in separating complex foregrounds and backgrounds. We formulate object-based co-segmentation as a co-selection graph in which regions with foreground-like characteristics are favored while also accounting for intra-video and inter-video foreground coherence. To handle multiple foreground objects, we expand the co-selection graph model into a proposed multi-state selection graph model (MSG) that optimizes the segmentations of different objects jointly. This extension into the MSG can be applied not only to our co-selection graph, but also can be used to turn any standard graph model into a multi-state selection solution that can be optimized directly by the existing energy minimization techniques. Our experiments show that our object-based multiple foreground video co-segmentation method (ObMiC) compares well to related techniques on both single and multiple foreground cases.
  • Efficient Structured Parsing of Fa�ades Using Dynamic Programming Authors: Andrea Cohen, Alexander G. Schwing, Marc Pollefeys
    We propose a sequential optimization technique for segmenting a rectified image of a facade into semantic categories. Our method retrieves a parsing which respects common architectural constraints and also returns a certificate for global optimality. Contrasting the suggested method, the considered facade labeling problem is typically tackled as a classification task or as grammar parsing. Both approaches are not capable of fully exploiting the regularity of the problem. Therefore, our technique very significantly improves the accuracy compared to the state-of-the-art while being an order of magnitude faster. In addition, in 85% of the test images we obtain a certificate for optimality.
  • Dense Semantic Image Segmentation with Objects and Attributes Authors: Shuai Zheng, Ming-Ming Cheng, Jonathan Warrell, Paul Sturgess, Vibhav Vineet, Carsten Rother, Philip H. S. Torr
    The concepts of objects and attributes are both important for describing images precisely, since verbal descriptions often contain both adjectives and nouns (e.g. "I see a shiny red chair'). In this paper, we formulate the problem of joint visual attribute and object class image segmentation as a dense multi-labelling problem, where each pixel in an image can be associated with both an object-class and a set of visual attributes labels. In order to learn the label correlations, we adopt a boosting-based piecewise training approach with respect to the visual appearance and co-occurrence cues. We use a filtering-based mean-field approximation approach for efficient joint inference. Further, we develop a hierarchical model to incorporate region-level object and attribute information. Experiments on the aPASCAL, CORE and attribute augmented NYU indoor scenes datasets show that the proposed approach is able to achieve state-of-the-art results.
  • Semantic Object Selection Authors: Ejaz Ahmed, Scott Cohen, Brian Price
    Interactive object segmentation has great practical importance in computer vision. Many interactive methods have been proposed utilizing user input in the form of mouse clicks and mouse strokes, and often requiring a lot of user intervention. In this paper, we present a system with a far simpler input method: the user needs only give the name of the desired object. With the tag provided by the user we do a text query of an image database to gather exemplars of the object. Using object proposals and borrowing ideas from image retrieval and object detection, the object is localized in the target image. An appearance model generated from the exemplars and the location prior are used in an energy minimization framework to select the object. Our method outperforms the state-of-the-art on existing datasets and on a more challenging dataset we collected.

Orals 4A : Computational Photography: Sensing and Display

  • Fourier Analysis on Transient Imaging with a Multifrequency Time-of-Flight Camera Authors: Jingyu Lin, Yebin Liu, Matthias B. Hullin, Qionghai Dai
    A transient image is the optical impulse response of a scene which visualizes light propagation during an ultra-short time interval. In this paper we discover that the data captured by a multifrequency time-of-flight (ToF) camera is the Fourier transform of a transient image, and identify the sources of systematic error. Based on the discovery we propose a novel framework of frequency-domain transient imaging, as well as algorithms to remove systematic error. The whole process of our approach is of much lower computational cost, especially lower memory usage, than Heide et al.'s approach using the same device. We evaluate our approach on both synthetic and real-datasets.
  • 3D Shape and Indirect Appearance by Structured Light Transport Authors: Matthew O'Toole, John Mather, Kiriakos N. Kutulakos
    We consider the problem of deliberately manipulating the direct and indirect light flowing through a time-varying, fully-general scene in order to simplify its visual analysis. Our approach rests on a crucial link between stereo geometry and light transport: while direct light always obeys the epipolar geometry of a projector-camera pair, indirect light overwhelmingly does not. We show that it is possible to turn this observation into an imaging method that analyzes light transport in real time in the optical domain, prior to acquisition. This yields three key abilities that we demonstrate in an experimental camera prototype: (1) producing a live indirect-only video stream for any scene, regardless of geometric or photometric complexity; (2) capturing images that make existing structured-light shape recovery algorithms robust to indirect transport; and (3) turning them into one-shot methods for dynamic 3D shape capture.
  • Shape-Preserving Half-Projective Warps for Image Stitching Authors: Che-Han Chang, Yoichi Sato, Yung-Yu Chuang
    This paper proposes a novel parametric warp which is a spatial combination of a projective transformation and a similarity transformation. Given the projective transformation relating two input images, based on an analysis of the projective transformation, our method smoothly extrapolates the projective transformation of the overlapping regions into the non-overlapping regions and the resultant warp gradually changes from projective to similarity across the image. The proposed warp has the strengths of both projective and similarity warps. It provides good alignment accuracy as projective warps while preserving the perspective of individual image as similarity warps. It can also be combined with more advanced local-warp-based alignment methods such as the as-projective-as-possible warp for better alignment accuracy. With the proposed warp, the field of view can be extended by stitching images with less projective distortion (stretched shapes and enlarged sizes).
  • Transparent Object Reconstruction via Coded Transport of Intensity Authors: Chenguang Ma, Xing Lin, Jinli Suo, Qionghai Dai, Gordon Wetzstein
    Capturing and understanding visual signals is one of the core interests of computer vision. Much progress has been made w.r.t. many aspects of imaging, but the reconstruction of refractive phenomena, such as turbulence, gas and heat flows, liquids, or transparent solids, has remained a challenging problem. In this paper, we derive an intuitive formulation of light transport in refractive media using light fields and the transport of intensity equation. We show how coded illumination in combination with pairs of recorded images allow for robust computational reconstruction of dynamic two and three-dimensional refractive phenomena.

Orals 4B : Recognition: Detection, Categorization, Classification

  • BING: Binarized Normed Gradients for Objectness Estimation at 300fps Authors: Ming-Ming Cheng, Ziming Zhang, Wen-Yan Lin, Philip Torr
    Training a generic objectness measure to produce a small set of candidate object windows, has been shown to speed up the classical sliding window object detection paradigm. We observe that generic objects with well-defined closed boundary can be discriminated by looking at the norm of gradients, with a suitable resizing of their corresponding image windows in to a small fixed size. Based on this observation and computational reasons, we propose to resize the window to 8 � 8 and use the norm of the gradients as a simple 64D feature to describe it, for explicitly training a generic objectness measure. We further show how the binarized version of this feature, namely binarized normed gradients (BING), can be used for efficient objectness estimation, which requires only a few atomic operations (e.g. ADD, BITWISE SHIFT, etc.). Experiments on the challenging PASCAL VOC 2007 dataset show that our method efficiently (300fps on a single laptop CPU) generates a small set of category-independent, high quality object windows, yielding 96.2% object detection rate (DR) with 1,000 proposals. Increasing the numbers of proposals and color spaces for computing BING features, our performance can be further improved to 99.5% DR.