TechTalks from event: Conference on Learning Theory

  • Distance Preserving Embeddings for General n-Dimensional Manifolds Authors: Nakul Verma
    Low dimensional embeddings of manifold data have gained popularity in the last decade. However, a systematic finite sample analysis of manifold embedding algorithms largely eludes researchers. Here we present two algorithms that, given access to just the samples, embed the underlying n- dimensional manifold into Rd (where d only depends on some key manifold properties such as its intrinsic dimension, volume and curvature) and guarantee to approximately preserve all interpoint geodesic distances.
  • A Method of Moments for Mixture Models and Hidden Markov Models Authors: Animashree Anandkumar, Daniel Hsu and Sham M. Kakade
    Mixture models are a fundamental tool in applied statistics and machine learning for treating data taken from multiple subpopulations. The current practice for estimating the parameters of such models relies on local search heuristics (e.g., the EM algorithm) which are prone to failure, and existing consistent methods are unfavorable due to their high computational and sample complexity which typically scale exponentially with the number of mixture components. This work develops an efficient method of moments approach to parameter estimation for a broad class of high-dimensional mixture models with many components, including multi-view mixtures of Gaussians (such as mixtures of axis-aligned Gaussians) and hidden Markov models. The new method leads to rigorous unsupervised learning results for mixture models that were not achieved by previous works; and, because of its simplicity, it offers a viable alternative to EM for practical deployment.
  • A Correlation Clustering Approach to Link Classification in Signed Networks Authors: Nicol Cesa-Bianchi, Claudio Gentile, Fabio Vitale and Giovanni Zappella
    Motivated by social balance theory, we develop a theory of link classification in signed networks using the correlation clustering index as measure of label regularity. We derive learning bounds in terms of correlation clustering within three fundamental transductive learning settings: online, batch and active. Our main algorithmic contribution is in the active setting, where we introduce a new family of efficient link classifiers based on covering the input graph with small circuits. These are the first active algorithms for link classification with mistake bounds that hold for arbitrary signed networks.
  • Spectral Clustering of Graphs with General Degrees in the Extended Planted Partition Model Authors: Kamalika Chaudhuri, Fan Chung and Alexander Tsiatas
    In this paper, we examine a spectral clustering algorithm for similarity graphs drawn from a simple random graph model, where nodes are allowed to have varying degrees, and we provide theoretical bounds on its performance. The random graph model we study is the Extended Planted Partition (EPP) model, a variant of the classical planted partition model. The standard approach to spectral clustering of graphs is to compute the bottom k singular vectors or eigenvectors of a suitable graph Laplacian, project the nodes of the graph onto these vectors, and then use an iterative clustering algorithm on the projected nodes. However a challenge with applying this approach to graphs generated from the EPP model is that unnormalized Laplacians do not work, and normalized Laplacians do not concentrate well when the graph has a number of low degree nodes. We resolve this issue by introducing the notion of a degree-corrected graph Laplacian. For graphs with many low degree nodes, degree correction has a regularizing effect on the Laplacian. Our spectral clustering algorithm projects the nodes in the graph onto the bottom k right singular vectors of the degree-corrected random-walk Laplacian, and clusters the nodes in this subspace. We show guarantees on the performance of this algorithm, demonstrating that it outputs the correct partition under a wide range of parameter values. Unlike some previous work, our algorithm does not require access to any generative parameters of the model.
  • Toward Understanding Complex Spaces: Graph Laplacians on Manifolds with Singularities and Boundaries Authors: Mikhail Belkin, Qichao Que, Yusu Wang and Xueyuan Zhou
    In manifold learning, algorithms based on graph Laplacian constructed from data have received considerable attention both in practical applications and theoretical analysis. Much of the existing work has been done under the assumption that the data is sampled from a manifold without boundaries and singularities or that the functions of interest are evaluated away from such points. At the same time, it can be argued that singularities and boundaries are an important aspect of the geometry of realistic data. Boundaries occur whenever the process generating data has a bounding constraint; while singularities appear when two different manifolds intersect or if a process undergoes a "phase transition", changing non-smoothly as a function of a parameter. In this paper we consider the behavior of graph Laplacians at points at or near boundaries and two main types of other singularities: intersections, where different manifolds come together and sharp "edges", where a manifold sharply changes direction. We show that the behavior of graph Laplacian near these singularities is quite different from that in the interior of the manifolds. In fact, a phenomenon somewhat reminiscent of the Gibbs effect in the analysis of Fourier series, can be observed in the behavior of graph Laplacian near such points. Unlike in the interior of the domain, where graph Laplacian converges to the Laplace-Beltrami operator, near singularities graph Laplacian tends to a first-order differential operator, which exhibits different scaling behavior as a function of the kernel width. One important implication is that while points near the singularities occupy only a small part of the total volume, the difference in scaling results in a disproportionately large contribution to the total behavior. Another significant finding is that while the scaling behavior of the operator is the same near different types of singularities, they are very distinct at a more refined level of analysis. We believe that a comprehensive understanding of these structures in addition to the standard case of a smooth manifold can take us a long way toward better methods for analysis of complex non-linear data and can lead to significant progress in algorithm design. Keywords: Graph Laplacian, singularities, limit analysis
  • Exact Recovery of Sparsely-Used Dictionaries Authors: Daniel A. Spielman, Huan Wang and John Wright
    We consider the problem of learning sparsely used dictionaries with an arbitrary square dictionary and a random, sparse coefficient matrix. We prove that O(n log n) samples are sufficient to uniquely determine the coefficient matrix. Based on this proof, we design a polynomial-time algorithm, called Exact Recovery of Sparsely-Used Dictionaries (ER-SpUD), and prove that it probably recovers the dictionary and coefficient matrix when the coefficient matrix is sufficiently sparse. Simulation results show that ER-SpUD reveals the true dictionary as well as the coefficients with probability higher than many state-of-the-art algorithms.
  • Near-Optimal Algorithms for Online Matrix Prediction Authors: Elad Hazan, Satyen Kale and Shai Shalev-Shwartz
    In several online prediction problems of recent interest the comparison class is composed of matrices with bounded entries. For example, in the online max-cut problem, the comparison class is matrices which represent cuts of a given graph and in online gambling the comparison class is matrices which represent permutations over n teams. Another important example is online collaborative filtering in which a widely used comparison class is the set of matrices with a small trace norm. In this paper we isolate a property of matrices, which we call (?,?)-decomposability, and derive an efficient online learning algorithm, that enjoys a regret bound of (???T ) for all problems in which the comparison class is composed of (?,?)-decomposable matrices. By analyzing the decomposability of cut matrices, low trace-norm matrices and triangular matrices, we derive near optimal regret bounds for online max-cut, online collaborative filtering and online gambling. In particular, this resolves (in the affirmative) an open problem posed by Abernethy (2010); Kleinberg et al. (2010). Finally, we derive lower bounds for the three problems and show that our upper bounds are optimal up to logarithmic factors. In particular, our lower bound for the online collaborative filtering problem resolves another open problem posed by Shamir and Srebro (2011).
  • Analysis of Thompson Sampling for the Multi-armed Bandit Problem Authors: Shipra Agrawal and Navin Goyal
    The multi-armed bandit problem is a popular model for studying exploration/exploitation trade-off in sequential decision problems. Many algorithms are now available for this well-studied problem. One of the earliest algorithms, given by W. R. Thompson, dates back to 1933. This algorithm, referred to as Thompson Sampling, is a natural Bayesian algorithm. The basic idea is to choose an arm to play according to its probability of being the best arm. Thompson Sampling algorithm has experimentally been shown to be close to optimal. In addition, it is efficient to implement and exhibits several desirable properties such as small regret for delayed feedback. However, theoretical understanding of this algorithm was quite limited. In this paper, for the first time, we show that Thompson Sampling algorithm achieves logarithmic expected regret for the stochastic multi-armed bandit problem. More precisely, for the stochastic two-armed bandit problem, the expected regret in time T is O((ln T) ? ? + 1 ? ?3). And, for the stochastic N-armed bandit problem, the expected regret in time T is O([?i=2..N 1 ? (?i)2] ln T). Our bounds are optimal but for the dependence on ?i and the constant factors in big-Oh.
  • Autonomous Exploration For Navigating In MDPs Authors: Shiau Hong Lim and Peter Auer
    While intrinsically motivated learning agents hold considerable promise to overcome limitations of more supervised learning systems, quantitative evaluation and theoretical analysis of such agents are difficult. We propose to consider a restricted setting for autonomous learning where systematic evaluation of learning performance is possible. In this setting the agent needs to learn to navigate in a Markov Decision Process where extrinsic rewards are not present or are ignored. We present a learning algorithm for this scenario and evaluate it by the amount of exploration it uses to learn the environment.
  • Towards Minimax Policies for Online Linear Optimization with Bandit Feedback Authors: Sbastien Bubeck, Nicolo Cesa-Bianchi and Sham M. Kakade
    We address the online linear optimization problem with bandit feedback. Our contribution is twofold. First, we provide an algorithm (based on exponential weights) with a regret of order ?dn log N for any finite action set with N actions, under the assumption that the instantaneous loss is bounded by 1. This shaves off an extraneous ?d factor compared to previous works, and gives a regret bound of order d?n log n for any compact set of actions. Without further assumptions on the action set, this last bound is minimax optimal up to a logarithmic factor. Interestingly, our result also shows that the minimax regret for bandit linear optimization with expert advice in d dimension is the same as for the basic d-armed bandit with expert advice. Our second contribution is to show how to use the Mirror Descent algorithm to obtain computationally efficient strategies with minimax optimal regret bounds in specific examples. More precisely we study two canonical action sets: the hypercube and the Euclidean ball. In the former case, we obtain the first computationally efficient algorithm with a d?n regret, thus improving by a factor ?d log n over the best known result for a computationally efficient algorithm. In the latter case, our approach gives the first algorithm with a ?dn log n, again shaving off an extraneous ?d compared to previous works.