TechTalks from event: ICML 2011

Outlier Detection

  • On the Robustness of Kernel Density M-Estimators Authors: JooSeuk Kim; Clayton Scott
    We analyze a method for nonparametric density estimation that exhibits robustness to contamination of the training sample. This method achieves robustness by combining a traditional kernel density estimator (KDE) with ideas from classical M-estimation. The KDE based on a Gaussian kernel is interpreted as a sample mean in the associated reproducing kernel Hilbert space (RKHS). This mean is estimated robustly through the use of a robust loss, yielding the so-called robust kernel density estimator (RKDE). This robust sample mean can be found via a kernelized iteratively re-weighted least squares (IRWLS) algorithm. Our contributions are summarized as follows. First, we present a representer theorem for the RKDE, which gives an insight into the robustness of the RKDE. Second, we provide necessary and sufficient conditions for kernel IRWLS to converge to the global minimizer, in the Gaussian RKHS, of the objective function defining the RKDE. Third, characterize and provide a method for computing the influence function associated with the RKDE. Fourth, we illustrate the robustness of the RKDE through experiments on several data sets.
  • Learning Multi-View Neighborhood Preserving Projections Authors: Novi Quadrianto; Christoph Lampert
    We address the problem of metric learning for multi-view data, namely the construction of embedding projections from data in different representations into a shared feature space, such that the Euclidean distance in this space provides a meaningful within-view as well as between-view similarity. Our motivation stems from the problem of cross-media retrieval tasks, where the availability of a joint Euclidean distance function is a prerequisite to allow fast, in particular hashing-based, nearest neighbor queries. We formulate an objective function that expresses the intuitive concept that matching samples are mapped closely together in the output space, whereas non-matching samples are pushed apart, no matter in which view they are available. The resulting optimization problem is not convex, but it can be decomposed explicitly into a convex and a concave part, thereby allowing efficient optimization using the convex-concave procedure. Experiments on an image retrieval task show that nearest-neighbor based cross-view retrieval is indeed possible, and the proposed technique improves the retrieval accuracy over baseline techniques.