TechTalks from event: ICML 2011

Clustering

  • On Information-Maximization Clustering: Tuning Parameter Selection and Analytic Solution Authors: Masashi Sugiyama; Makoto Yamada; Manabu Kimura; Hirotaka Hachiya
    Information-maximization clustering learns a probabilistic classifier in an unsupervised manner so that mutual information between feature vectors and cluster assignments is maximized. A notable advantage of this approach is that it only involves continuous optimization of model parameters, which is substantially easier to solve than discrete optimization of cluster assignments. However, existing methods still involve non-convex optimization problems, and therefore finding a good local optimal solution is not straightforward in practice. In this paper, we propose an alternative information-maximization clustering method based on a squared-loss variant of mutual information. This novel approach gives a clustering solution analytically in a computationally efficient way via kernel eigenvalue decomposition. Furthermore, we provide a practical model selection procedure that allows us to objectively optimize tuning parameters included in the kernel function. Through experiments, we demonstrate the usefulness of the proposed approach.
  • Pruning nearest neighbor cluster trees Authors: Samory Kpotufe; Ulrike von Luxburg
    Nearest neighbor ($k$-NN) graphs are widely used in machine learning and data mining applications, and our aim is to better understand what they reveal about the cluster structure of the unknown underlying distribution of points. Moreover, is it possible to identify spurious structures that might arise due to sampling variability? Our first contribution is a statistical analysis that reveals how certain subgraphs of a $k$-NN graph form a consistent estimator of the cluster tree of the underlying distribution of points. Our second and perhaps most important contribution is the following finite sample guarantee. We carefully work out the tradeoff between aggressive and conservative pruning and are able to guarantee the removal of all spurious cluster structures while at the same time guaranteeing the recovery of salient clusters. This is the first such finite sample result in the context of clustering.
  • A Co-training Approach for Multi-view Spectral Clustering Authors: Abhishek Kumar; Hal Daume III
    We propose a spectral clustering algorithm for the multi-view setting where we have access to multiple views of the data, each of which can be independently used for clustering. Our spectral clustering algorithm has a flavor of co-training, which is already a widely used idea in semi-supervised learning. We work on the assumption that the true underlying clustering would assign a point to the same cluster irrespective of the view. Hence, we constrain our approach to only search for the clusterings that agree across the views. Our algorithm does not have any hyperparameters to set, which is a major advantage in unsupervised learning. We empirically compare with a number of baseline methods on synthetic and real-world datasets to show the efficacy of the proposed algorithm.
  • Clusterpath: an Algorithm for Clustering using Convex Fusion Penalties Authors: Toby Hocking; Jean-Philippe Vert; Francis Bach; Armand Joulin
    We present a new clustering algorithm by proposing a convex relaxation of hierarchical clustering, which results in a family of objective functions with a natural geometric interpretation. We give efficient algorithms for calculating the continuous regularization path of solutions, and discuss relative advantages of the parameters. Our method experimentally gives state-of-the-art results similar to spectral clustering for non-convex clusters, and has the added benefit of learning a tree structure from the data.
  • A Unified Probabilistic Model for Global and Local Unsupervised Feature Selection Authors: Yue Guan; Jennifer Dy; Michael Jordan
    Existing algorithms for joint clustering and feature selection can be categorized as either global or local approaches. Global methods select a single cluster-independent subset of features, whereas local methods select cluster-specific subsets of features. In this paper, we present a unified probabilistic model that can perform both global and local feature selection for clustering. Our approach is based on a hierarchical beta-Bernoulli prior combined with a Dirichlet process mixture model. We obtain global or local feature selection by adjusting the variance of the beta prior. We provide a variational inference algorithm for our model. In addition to simultaneously learning the clusters and features, this Bayesian formulation allows us to learn both the number of clusters and the number of features to retain. Experiments on synthetic and real data show that our unified model can find global and local features and cluster data as well as competing methods of each type.