TechTalks from event: ICML 2011

Feature Selection, Dimensionality Reduction

  • Eigenvalue Sensitive Feature Selection Authors: Yi Jiang; Jiangtao Ren
    In recent years, some spectral feature selection methods are proposed to choose those features with high power of preserving sample similarity. However, when there exist lots of irrelevant or noisy features in data, the similarity matrix constructed from all the un-weighted features may be not reliable, which then misleads existing spectral feature selection methods to select 'wrong' features. To solve this problem, we propose that feature importance should be evaluated according to their impacts on similarity matrix, which means features with high impacts on similarity matrix should be chosen as important ones. Since graph Laplaciancite{luxbury2007} is defined on the similarity matrix, then the impact of each feature on similarity matrix can be reflected on the change of graph Laplacian, especially on its eigen-system. Based on this point of view, we propose an Eigenvalue Sensitive Criteria (EVSC) for feature selection, which aims at seeking those features with high impact on graph Laplacian's eigenvalues. Empirical analysis demonstrates our proposed method outperforms some traditional spectral feature selection methods.
  • Cauchy Graph Embedding Authors: Dijun Luo; Chris Ding; Feiping Nie; Heng Huang
    Laplacian embedding provides a low-dimensional representation for the nodes of a graph where the edge weights denote pairwise similarity among the node objects. It is commonly assumed that the Laplacian embedding results preserve the local topology of the original data on the low-dimensional projected subspaces, i.e., for any pair of graph nodes with large similarity, they should be embedded closely in the embedded space. However, in this paper, we will show that the Laplacian embedding often cannot preserve local topology well as we expected. To enhance the local topology preserving property in graph embedding, we propose a novel Cauchy graph embedding which preserves the similarity relationships of the original data in the embedded space via a new objective. Consequentially the machine learning tasks (such as k Nearest Neighbor type classifications) can be easily conducted on the embedded data with better performance. The experimental results on both synthetic and real world benchmark data sets demonstrate the usefulness of this new type of embedding.
  • Tree preserving embedding Authors: Albert Shieh; Tatsunori Hashimoto; Edo Airoldi
    Visualization techniques for complex data are a workhorse of modern scientific pursuits. The goal of visualization is to embed high dimensional data in a low dimensional space, while preserving structure in the data relevant to exploratory data analysis, such as the existence of clusters. However, existing visualization methods often either entirely fail to preserve clusters in embeddings due to the crowding problem or can only preserve clusters at a single resolution. Here, we develop a new approach to visualization, tree preserving embedding (TPE). Our approach takes advantage of the topological notion of connectedness to provably preserve clusters at all resolutions. Our performance guarantee holds for finite samples, which makes TPE a robust method for applications. Our approach suggests new strategies for robust data visualization in practice.
  • Stochastic Low-Rank Kernel Learning for Regression Authors: Pierre Machart; Thomas Peel; Sandrine Anthoine; Liva Ralaivola; Hervé Glotin,
    We present a novel approach to learn a kernel-based regression function. It is based on the use of conical combinations of data-based parameterized kernels and on a new stochastic convex optimization procedure of which we establish convergence guarantees. The overall learning procedure has the nice properties that a) the learned conical combination is automatically designed to perform the regression task at hand and b) the updates implicated by the optimization procedure are quite inexpensive. In order to shed light on the appositeness of our learning strategy, we present empirical results from experiments conducted on various benchmark datasets.