Conference on Learning Theory
TechTalks from event: Conference on Learning Theory

Unsupervised SVMs: On the Complexity of the Furthest Hyperplane ProblemThis paper introduces the Furthest Hyperplane Problem (FHP), which is an unsupervised counterpart of Support Vector Machines. Given a set of n points in Rd, the objective is to produce the hyperplane (passing through the origin) which maximizes the separation margin, that is, the minimal distance between the hyperplane and any input point. To the best of our knowledge, this is the first paper achieving provable results regarding FHP. We provide both lower and upper bounds to this NPhard problem. First, we give a simple randomized algorithm whose running time is nO(1/?2) where ? is the optimal separation margin. We show that its exponential dependency on 1/?2 is tight, up to subpolynomial factors, assuming SAT cannot be solved in subexponential time. Next, we give an efficient approximation algorithm. For any ? ? [0, 1], the algorithm produces a hyperplane whose distance from at least 1  3? fraction of the points is at least ? times the optimal separation margin. Finally, we show that FHP does not admit a PTAS by presenting a gap preserving reduction from a particular version of the PCP theorem.

(weak) Calibration is Computationally HardWe show that the existence of a computationally efficient calibration algorithm, with a low weak calibration rate, would imply the existence of an efficient algorithm for computing approximate Nash equilibria  thus implying the unlikely conclusion that every problem in PPAD is solvable in polynomial time.

Learning Valuation FunctionsA core element of microeconomics and game theory is that consumers have valuation functions over bundles of goods and that these valuations functions drive their purchases. A common assumption is that these functions are subadditive meaning that the value given to a bundle is at most the sum of values on the individual items. In this paper, we provide nearly tight guarantees on the efficient learnability of subadditive valuations. We also provide nearly tight bounds for the subclass of XOS (fractionally subadditive) valuations, also widely used in the literature. We additionally leverage the structure of valuations in a number of interesting subclasses and obtain algorithms with stronger learning guarantees.

Unified Algorithms for Online Learning and Competitive AnalysisOnline learning and competitive analysis are two widely studied frameworks for online decisionmaking settings. Despite the frequent similarity of the problems they study, there are significant differences in their assumptions, goals and techniques, hindering a unified analysis and richer interplay between the two. In this paper, we provide several contributions in this direction. We provide a single unified algorithm which by parameter tuning, interpolates between optimal regret for learning from experts (in online learning) and optimal competitive ratio for the metrical task systems problem (MTS) (in competitive analysis), improving on the results of Blum and Burch (1997). The algorithm also allows us to obtain new regret bounds against "drifting" experts, which might be of independent interest. Moreover, our approach allows us to go beyond experts/MTS, obtaining similar unifying results for structured action sets and "combinatorial experts", whenever the setting has a certain matroid structure.

Online Optimization with Gradual VariationsWe study the online convex optimization problem, in which an online algorithm has to make repeated decisions with convex loss functions and hopes to achieve a small regret. We consider a natural restriction of this problem in which the loss functions have a small deviation, measured by the sum of the distances between every two consecutive loss functions, according to some distance metrics. We show that for the linear and general smooth convex loss functions, an online algorithm modified from the gradient descend algorithm can achieve a regret which only scales as the square root of the deviation. For the closely related problem of prediction with expert advice, we show that an online algorithm modified from the multiplicative update algorithm can also achieve a similar regret bound for a different measure of deviation. Finally, for loss functions which are strictly convex, we show that an online algorithm modified from the online Newton step algorithm can achieve a regret which is only logarithmic in terms of the deviation, and as an application, we can also have such a logarithmic regret for the portfolio management problem.

The Optimality of Jeffreys Prior for Online Density Estimation and the Asymptotic Normality of Maximum Likelihood EstimatorsWe study online learning under logarithmic loss with regular parametric models. We show that a Bayesian strategy predicts optimally only if it uses Jeffreys prior. This result was known for canonical exponential families; we extend it to parametric models for which the maximum likelihood estimator is asymptotically normal. The optimal prediction strategy, normalized maximum likelihood, depends on the number n of rounds of the game, in general. However, when a Bayesian strategy is optimal, normalized maximum likelihood becomes independent of n. Our proof uses this to exploit the asymptotics of normalized maximum likelihood. The asymptotic normality of the maximum likelihood estimator is responsible for the necessity of Jeffreys prior.

PACBayesian Bound for Gaussian Process Regression and Multiple Kernel Additive ModelWe develop a PACBayesian bound for the convergence rate of a Bayesian variant of Multiple Kernel Learning (MKL) that is an estimation method for the sparse additive model. Standard analyses for MKL require a strong condition on the design analogous to the restricted eigenvalue condition for the analysis of Lasso and Dantzig selector. In this paper, we apply PACBayesian technique to show that the Bayesian variant of MKL achieves the optimal convergence rate without such strong conditions on the design. Basically our approach is a combination of PACBayes and recently developed theories of nonparametric Gaussian process regressions. Our bound is developed in a fixed design situation. Our analysis includes the existing result of Gaussian process as a special case and the proof is much simpler by virtue of PACBayesian technique. We also give the convergence rate of the Bayesian variant of Group Lasso as a finite dimensional special case.

Random Design Analysis of Ridge RegressionThis work gives a simultaneous analysis of both the ordinary least squares estimator and the ridge regression estimator in the random design setting under mild assumptions on the covariate/response distributions. In particular, the analysis provides sharp results on the "outofsample" prediction error, as opposed to the "insample" (fixed design) error. The analysis also reveals the effect of errors in the estimated covariance structure, as well as the effect of modeling errors; neither of which effects are present in the fixed design setting. The proof of the main results are based on a simple decomposition lemma combined with concentration inequalities for random vectors and matrices.

Reconstruction from Anisotropic Random MeasurementsRandom matrices are widely used in sparse recovery problems, and the relevant properties of matrices with i.i.d. entries are well understood. The current paper discusses the recently introduced Restricted Eigenvalue (RE) condition, which is among the most general assumptions on the matrix, guaranteeing recovery. We prove a reduction principle showing that the RE condition can be guaranteed by checking the restricted isometry on a certain family of lowdimensional subspaces. This principle allows us to establish the RE condition for several broad classes of random matrices with dependent entries, including random matrices with subgaussian rows and nontrivial covariance structure, as well as matrices with independent rows, and uniformly bounded entries.

Toward a Noncommutative Arithmeticgeometric Mean Inequality: Conjectures, Casestudies, and ConsequencesRandomized algorithms that base iterationlevel decisions on samples from some pool are ubiquitous in machine learning and optimization. Examples include stochastic gradient descent and randomized coordinate descent. This paper makes progress at theoretically evaluating the difference in performance between sampling with and withoutreplacement in such algorithms. Focusing on least means squares optimization, we formulate a noncommutative arithmeticgeometric mean inequality that would prove that the expected convergence rate of withoutreplacement sampling is faster than that of withreplacement sampling. We demonstrate that this inequality holds for many classes of random matrices and for some pathological examples as well. We provide a deterministic worstcase bound on the gap between the discrepancy between the two sampling models, and explore some of the impediments to proving this inequality in full generality. We detail the consequences of this inequality for stochastic gradient descent and the randomized Kaczmarz algorithm for solving linear systems.
 All Talks
 Unsupervised SVMs: On the Complexity of the Furthest Hyperplane Problem
 (weak) Calibration is Computationally Hard
 Learning Valuation Functions
 Unified Algorithms for Online Learning and Competitive Analysis
 Online Optimization with Gradual Variations
 The Optimality of Jeffreys Prior for Online Density Estimation and the Asymptotic Normality of Maximum Likelihood Estimators
 PACBayesian Bound for Gaussian Process Regression and Multiple Kernel Additive Model
 Random Design Analysis of Ridge Regression
 Reconstruction from Anisotropic Random Measurements
 Toward a Noncommutative Arithmeticgeometric Mean Inequality: Conjectures, Casestudies, and Consequences
 L1 Covering Numbers for Uniformly Bounded Convex Functions
 Generalization Bounds for Online Learning Algorithms with Pairwise Loss Functions
 AttributeEfficient Learning andWeightDegree Tradeoffs for Polynomial Threshold Functions
 Learning Functions of Halfspaces Using Prefix Covers
 Computational Bounds on Statistical Query Learning
 Learning DNF Expressions from Fourier Spectrum
 Consistency of Nearest Neighbor Classification under Selective Sampling
 Active Learning Using Smooth Relative Regret Approximations with Applications
 Robust Interactive Learning
 Rare Probability Estimation under Regularly Varying Heavy Tails
 Competitive Classification and Closeness Testing
 Kernels Based Tests with Nonasymptotic Bootstrap Approaches for Twosample Problems
 Differentially Private Online Learning
 Private Convex Empirical Risk Minimization and Highdimensional Regression
 Distributed Learning, Communication Complexity and Privacy
 A Characterization of Scoring Rules for Linear Properties
 Divergences and Risks for Multiclass Experiments
 A Conjugate Property between Loss Functions and Uncertainty Sets in Classification Problems
 New Bounds for Learning Intervals with Implications for SemiSupervised Learning
 Tight Bounds on Proper Equivalence Query Learning of DNF
 Distance Preserving Embeddings for General nDimensional Manifolds
 A Method of Moments for Mixture Models and Hidden Markov Models
 A Correlation Clustering Approach to Link Classification in Signed Networks
 Spectral Clustering of Graphs with General Degrees in the Extended Planted Partition Model
 Toward Understanding Complex Spaces: Graph Laplacians on Manifolds with Singularities and Boundaries
 Exact Recovery of SparselyUsed Dictionaries
 NearOptimal Algorithms for Online Matrix Prediction
 Analysis of Thompson Sampling for the Multiarmed Bandit Problem
 Autonomous Exploration For Navigating In MDPs
 Towards Minimax Policies for Online Linear Optimization with Bandit Feedback
 The Best of Both Worlds: Stochastic and Adversarial Bandits
 Open Problem: Regret Bounds for Thompson Sampling
 Open Problem: Better Bounds for Online Logistic Regression
 Open Problem: Learning Dynamic Network Models from a Static Snapshot
 Open Problem: Does AdaBoost Always Cycle?
 Open Problem: Is Averaging Needed for Strongly Convex Stochastic Gradient Descent?