Conference on Learning Theory
TechTalks from event: Conference on Learning Theory

Distance Preserving Embeddings for General nDimensional ManifoldsLow dimensional embeddings of manifold data have gained popularity in the last decade. However, a systematic finite sample analysis of manifold embedding algorithms largely eludes researchers. Here we present two algorithms that, given access to just the samples, embed the underlying n dimensional manifold into Rd (where d only depends on some key manifold properties such as its intrinsic dimension, volume and curvature) and guarantee to approximately preserve all interpoint geodesic distances.

A Method of Moments for Mixture Models and Hidden Markov ModelsMixture models are a fundamental tool in applied statistics and machine learning for treating data taken from multiple subpopulations. The current practice for estimating the parameters of such models relies on local search heuristics (e.g., the EM algorithm) which are prone to failure, and existing consistent methods are unfavorable due to their high computational and sample complexity which typically scale exponentially with the number of mixture components. This work develops an efficient method of moments approach to parameter estimation for a broad class of highdimensional mixture models with many components, including multiview mixtures of Gaussians (such as mixtures of axisaligned Gaussians) and hidden Markov models. The new method leads to rigorous unsupervised learning results for mixture models that were not achieved by previous works; and, because of its simplicity, it offers a viable alternative to EM for practical deployment.

A Correlation Clustering Approach to Link Classification in Signed NetworksMotivated by social balance theory, we develop a theory of link classification in signed networks using the correlation clustering index as measure of label regularity. We derive learning bounds in terms of correlation clustering within three fundamental transductive learning settings: online, batch and active. Our main algorithmic contribution is in the active setting, where we introduce a new family of efficient link classifiers based on covering the input graph with small circuits. These are the first active algorithms for link classification with mistake bounds that hold for arbitrary signed networks.

Spectral Clustering of Graphs with General Degrees in the Extended Planted Partition ModelIn this paper, we examine a spectral clustering algorithm for similarity graphs drawn from a simple random graph model, where nodes are allowed to have varying degrees, and we provide theoretical bounds on its performance. The random graph model we study is the Extended Planted Partition (EPP) model, a variant of the classical planted partition model. The standard approach to spectral clustering of graphs is to compute the bottom k singular vectors or eigenvectors of a suitable graph Laplacian, project the nodes of the graph onto these vectors, and then use an iterative clustering algorithm on the projected nodes. However a challenge with applying this approach to graphs generated from the EPP model is that unnormalized Laplacians do not work, and normalized Laplacians do not concentrate well when the graph has a number of low degree nodes. We resolve this issue by introducing the notion of a degreecorrected graph Laplacian. For graphs with many low degree nodes, degree correction has a regularizing effect on the Laplacian. Our spectral clustering algorithm projects the nodes in the graph onto the bottom k right singular vectors of the degreecorrected randomwalk Laplacian, and clusters the nodes in this subspace. We show guarantees on the performance of this algorithm, demonstrating that it outputs the correct partition under a wide range of parameter values. Unlike some previous work, our algorithm does not require access to any generative parameters of the model.

Toward Understanding Complex Spaces: Graph Laplacians on Manifolds with Singularities and BoundariesIn manifold learning, algorithms based on graph Laplacian constructed from data have received considerable attention both in practical applications and theoretical analysis. Much of the existing work has been done under the assumption that the data is sampled from a manifold without boundaries and singularities or that the functions of interest are evaluated away from such points. At the same time, it can be argued that singularities and boundaries are an important aspect of the geometry of realistic data. Boundaries occur whenever the process generating data has a bounding constraint; while singularities appear when two different manifolds intersect or if a process undergoes a "phase transition", changing nonsmoothly as a function of a parameter. In this paper we consider the behavior of graph Laplacians at points at or near boundaries and two main types of other singularities: intersections, where different manifolds come together and sharp "edges", where a manifold sharply changes direction. We show that the behavior of graph Laplacian near these singularities is quite different from that in the interior of the manifolds. In fact, a phenomenon somewhat reminiscent of the Gibbs effect in the analysis of Fourier series, can be observed in the behavior of graph Laplacian near such points. Unlike in the interior of the domain, where graph Laplacian converges to the LaplaceBeltrami operator, near singularities graph Laplacian tends to a firstorder differential operator, which exhibits different scaling behavior as a function of the kernel width. One important implication is that while points near the singularities occupy only a small part of the total volume, the difference in scaling results in a disproportionately large contribution to the total behavior. Another significant finding is that while the scaling behavior of the operator is the same near different types of singularities, they are very distinct at a more refined level of analysis. We believe that a comprehensive understanding of these structures in addition to the standard case of a smooth manifold can take us a long way toward better methods for analysis of complex nonlinear data and can lead to significant progress in algorithm design. Keywords: Graph Laplacian, singularities, limit analysis

Exact Recovery of SparselyUsed DictionariesWe consider the problem of learning sparsely used dictionaries with an arbitrary square dictionary and a random, sparse coefficient matrix. We prove that O(n log n) samples are sufficient to uniquely determine the coefficient matrix. Based on this proof, we design a polynomialtime algorithm, called Exact Recovery of SparselyUsed Dictionaries (ERSpUD), and prove that it probably recovers the dictionary and coefficient matrix when the coefficient matrix is sufficiently sparse. Simulation results show that ERSpUD reveals the true dictionary as well as the coefficients with probability higher than many stateoftheart algorithms.

NearOptimal Algorithms for Online Matrix PredictionIn several online prediction problems of recent interest the comparison class is composed of matrices with bounded entries. For example, in the online maxcut problem, the comparison class is matrices which represent cuts of a given graph and in online gambling the comparison class is matrices which represent permutations over n teams. Another important example is online collaborative filtering in which a widely used comparison class is the set of matrices with a small trace norm. In this paper we isolate a property of matrices, which we call (?,?)decomposability, and derive an efficient online learning algorithm, that enjoys a regret bound of (???T ) for all problems in which the comparison class is composed of (?,?)decomposable matrices. By analyzing the decomposability of cut matrices, low tracenorm matrices and triangular matrices, we derive near optimal regret bounds for online maxcut, online collaborative filtering and online gambling. In particular, this resolves (in the affirmative) an open problem posed by Abernethy (2010); Kleinberg et al. (2010). Finally, we derive lower bounds for the three problems and show that our upper bounds are optimal up to logarithmic factors. In particular, our lower bound for the online collaborative filtering problem resolves another open problem posed by Shamir and Srebro (2011).

Analysis of Thompson Sampling for the Multiarmed Bandit ProblemThe multiarmed bandit problem is a popular model for studying exploration/exploitation tradeoff in sequential decision problems. Many algorithms are now available for this wellstudied problem. One of the earliest algorithms, given by W. R. Thompson, dates back to 1933. This algorithm, referred to as Thompson Sampling, is a natural Bayesian algorithm. The basic idea is to choose an arm to play according to its probability of being the best arm. Thompson Sampling algorithm has experimentally been shown to be close to optimal. In addition, it is efficient to implement and exhibits several desirable properties such as small regret for delayed feedback. However, theoretical understanding of this algorithm was quite limited. In this paper, for the first time, we show that Thompson Sampling algorithm achieves logarithmic expected regret for the stochastic multiarmed bandit problem. More precisely, for the stochastic twoarmed bandit problem, the expected regret in time T is O((ln T) ? ? + 1 ? ?3). And, for the stochastic Narmed bandit problem, the expected regret in time T is O([?i=2..N 1 ? (?i)2] ln T). Our bounds are optimal but for the dependence on ?i and the constant factors in bigOh.

Autonomous Exploration For Navigating In MDPsWhile intrinsically motivated learning agents hold considerable promise to overcome limitations of more supervised learning systems, quantitative evaluation and theoretical analysis of such agents are difficult. We propose to consider a restricted setting for autonomous learning where systematic evaluation of learning performance is possible. In this setting the agent needs to learn to navigate in a Markov Decision Process where extrinsic rewards are not present or are ignored. We present a learning algorithm for this scenario and evaluate it by the amount of exploration it uses to learn the environment.

Towards Minimax Policies for Online Linear Optimization with Bandit FeedbackWe address the online linear optimization problem with bandit feedback. Our contribution is twofold. First, we provide an algorithm (based on exponential weights) with a regret of order ?dn log N for any finite action set with N actions, under the assumption that the instantaneous loss is bounded by 1. This shaves off an extraneous ?d factor compared to previous works, and gives a regret bound of order d?n log n for any compact set of actions. Without further assumptions on the action set, this last bound is minimax optimal up to a logarithmic factor. Interestingly, our result also shows that the minimax regret for bandit linear optimization with expert advice in d dimension is the same as for the basic darmed bandit with expert advice. Our second contribution is to show how to use the Mirror Descent algorithm to obtain computationally efficient strategies with minimax optimal regret bounds in specific examples. More precisely we study two canonical action sets: the hypercube and the Euclidean ball. In the former case, we obtain the first computationally efficient algorithm with a d?n regret, thus improving by a factor ?d log n over the best known result for a computationally efficient algorithm. In the latter case, our approach gives the first algorithm with a ?dn log n, again shaving off an extraneous ?d compared to previous works.
 All Talks
 Unsupervised SVMs: On the Complexity of the Furthest Hyperplane Problem
 (weak) Calibration is Computationally Hard
 Learning Valuation Functions
 Unified Algorithms for Online Learning and Competitive Analysis
 Online Optimization with Gradual Variations
 The Optimality of Jeffreys Prior for Online Density Estimation and the Asymptotic Normality of Maximum Likelihood Estimators
 PACBayesian Bound for Gaussian Process Regression and Multiple Kernel Additive Model
 Random Design Analysis of Ridge Regression
 Reconstruction from Anisotropic Random Measurements
 Toward a Noncommutative Arithmeticgeometric Mean Inequality: Conjectures, Casestudies, and Consequences
 L1 Covering Numbers for Uniformly Bounded Convex Functions
 Generalization Bounds for Online Learning Algorithms with Pairwise Loss Functions
 AttributeEfficient Learning andWeightDegree Tradeoffs for Polynomial Threshold Functions
 Learning Functions of Halfspaces Using Prefix Covers
 Computational Bounds on Statistical Query Learning
 Learning DNF Expressions from Fourier Spectrum
 Consistency of Nearest Neighbor Classification under Selective Sampling
 Active Learning Using Smooth Relative Regret Approximations with Applications
 Robust Interactive Learning
 Rare Probability Estimation under Regularly Varying Heavy Tails
 Competitive Classification and Closeness Testing
 Kernels Based Tests with Nonasymptotic Bootstrap Approaches for Twosample Problems
 Differentially Private Online Learning
 Private Convex Empirical Risk Minimization and Highdimensional Regression
 Distributed Learning, Communication Complexity and Privacy
 A Characterization of Scoring Rules for Linear Properties
 Divergences and Risks for Multiclass Experiments
 A Conjugate Property between Loss Functions and Uncertainty Sets in Classification Problems
 New Bounds for Learning Intervals with Implications for SemiSupervised Learning
 Tight Bounds on Proper Equivalence Query Learning of DNF
 Distance Preserving Embeddings for General nDimensional Manifolds
 A Method of Moments for Mixture Models and Hidden Markov Models
 A Correlation Clustering Approach to Link Classification in Signed Networks
 Spectral Clustering of Graphs with General Degrees in the Extended Planted Partition Model
 Toward Understanding Complex Spaces: Graph Laplacians on Manifolds with Singularities and Boundaries
 Exact Recovery of SparselyUsed Dictionaries
 NearOptimal Algorithms for Online Matrix Prediction
 Analysis of Thompson Sampling for the Multiarmed Bandit Problem
 Autonomous Exploration For Navigating In MDPs
 Towards Minimax Policies for Online Linear Optimization with Bandit Feedback
 The Best of Both Worlds: Stochastic and Adversarial Bandits
 Open Problem: Regret Bounds for Thompson Sampling
 Open Problem: Better Bounds for Online Logistic Regression
 Open Problem: Learning Dynamic Network Models from a Static Snapshot
 Open Problem: Does AdaBoost Always Cycle?
 Open Problem: Is Averaging Needed for Strongly Convex Stochastic Gradient Descent?