Conference on Learning Theory
TechTalks from event: Conference on Learning Theory
-
Unsupervised SVMs: On the Complexity of the Furthest Hyperplane ProblemThis paper introduces the Furthest Hyperplane Problem (FHP), which is an unsupervised counterpart of Support Vector Machines. Given a set of n points in Rd, the objective is to produce the hyperplane (passing through the origin) which maximizes the separation margin, that is, the minimal distance between the hyperplane and any input point. To the best of our knowledge, this is the first paper achieving provable results regarding FHP. We provide both lower and upper bounds to this NP-hard problem. First, we give a simple randomized algorithm whose running time is nO(1/?2) where ? is the optimal separation margin. We show that its exponential dependency on 1/?2 is tight, up to sub-polynomial factors, assuming SAT cannot be solved in sub-exponential time. Next, we give an efficient approximation algorithm. For any ? ? [0, 1], the algorithm produces a hyperplane whose distance from at least 1 - 3? fraction of the points is at least ? times the optimal separation margin. Finally, we show that FHP does not admit a PTAS by presenting a gap preserving reduction from a particular version of the PCP theorem.
-
(weak) Calibration is Computationally HardWe show that the existence of a computationally efficient calibration algorithm, with a low weak calibration rate, would imply the existence of an efficient algorithm for computing approximate Nash equilibria -- thus implying the unlikely conclusion that every problem in PPAD is solvable in polynomial time.
-
Learning Valuation FunctionsA core element of microeconomics and game theory is that consumers have valuation functions over bundles of goods and that these valuations functions drive their purchases. A common assumption is that these functions are subadditive meaning that the value given to a bundle is at most the sum of values on the individual items. In this paper, we provide nearly tight guarantees on the efficient learnability of subadditive valuations. We also provide nearly tight bounds for the subclass of XOS (fractionally subadditive) valuations, also widely used in the literature. We additionally leverage the structure of valuations in a number of interesting subclasses and obtain algorithms with stronger learning guarantees.
-
Unified Algorithms for Online Learning and Competitive AnalysisOnline learning and competitive analysis are two widely studied frameworks for online decisionmaking settings. Despite the frequent similarity of the problems they study, there are significant differences in their assumptions, goals and techniques, hindering a unified analysis and richer interplay between the two. In this paper, we provide several contributions in this direction. We provide a single unified algorithm which by parameter tuning, interpolates between optimal regret for learning from experts (in online learning) and optimal competitive ratio for the metrical task systems problem (MTS) (in competitive analysis), improving on the results of Blum and Burch (1997). The algorithm also allows us to obtain new regret bounds against "drifting" experts, which might be of independent interest. Moreover, our approach allows us to go beyond experts/MTS, obtaining similar unifying results for structured action sets and "combinatorial experts", whenever the setting has a certain matroid structure.
-
Online Optimization with Gradual VariationsWe study the online convex optimization problem, in which an online algorithm has to make repeated decisions with convex loss functions and hopes to achieve a small regret. We consider a natural restriction of this problem in which the loss functions have a small deviation, measured by the sum of the distances between every two consecutive loss functions, according to some distance metrics. We show that for the linear and general smooth convex loss functions, an online algorithm modified from the gradient descend algorithm can achieve a regret which only scales as the square root of the deviation. For the closely related problem of prediction with expert advice, we show that an online algorithm modified from the multiplicative update algorithm can also achieve a similar regret bound for a different measure of deviation. Finally, for loss functions which are strictly convex, we show that an online algorithm modified from the online Newton step algorithm can achieve a regret which is only logarithmic in terms of the deviation, and as an application, we can also have such a logarithmic regret for the portfolio management problem.
-
The Optimality of Jeffreys Prior for Online Density Estimation and the Asymptotic Normality of Maximum Likelihood EstimatorsWe study online learning under logarithmic loss with regular parametric models. We show that a Bayesian strategy predicts optimally only if it uses Jeffreys prior. This result was known for canonical exponential families; we extend it to parametric models for which the maximum likelihood estimator is asymptotically normal. The optimal prediction strategy, normalized maximum likelihood, depends on the number n of rounds of the game, in general. However, when a Bayesian strategy is optimal, normalized maximum likelihood becomes independent of n. Our proof uses this to exploit the asymptotics of normalized maximum likelihood. The asymptotic normality of the maximum likelihood estimator is responsible for the necessity of Jeffreys prior.
-
PAC-Bayesian Bound for Gaussian Process Regression and Multiple Kernel Additive ModelWe develop a PAC-Bayesian bound for the convergence rate of a Bayesian variant of Multiple Kernel Learning (MKL) that is an estimation method for the sparse additive model. Standard analyses for MKL require a strong condition on the design analogous to the restricted eigenvalue condition for the analysis of Lasso and Dantzig selector. In this paper, we apply PAC-Bayesian technique to show that the Bayesian variant of MKL achieves the optimal convergence rate without such strong conditions on the design. Basically our approach is a combination of PAC-Bayes and recently developed theories of non-parametric Gaussian process regressions. Our bound is developed in a fixed design situation. Our analysis includes the existing result of Gaussian process as a special case and the proof is much simpler by virtue of PAC-Bayesian technique. We also give the convergence rate of the Bayesian variant of Group Lasso as a finite dimensional special case.
-
Random Design Analysis of Ridge RegressionThis work gives a simultaneous analysis of both the ordinary least squares estimator and the ridge regression estimator in the random design setting under mild assumptions on the covariate/response distributions. In particular, the analysis provides sharp results on the "out-of-sample" prediction error, as opposed to the "in-sample" (fixed design) error. The analysis also reveals the effect of errors in the estimated covariance structure, as well as the effect of modeling errors; neither of which effects are present in the fixed design setting. The proof of the main results are based on a simple decomposition lemma combined with concentration inequalities for random vectors and matrices.
-
Reconstruction from Anisotropic Random MeasurementsRandom matrices are widely used in sparse recovery problems, and the relevant properties of matrices with i.i.d. entries are well understood. The current paper discusses the recently introduced Restricted Eigenvalue (RE) condition, which is among the most general assumptions on the matrix, guaranteeing recovery. We prove a reduction principle showing that the RE condition can be guaranteed by checking the restricted isometry on a certain family of low-dimensional subspaces. This principle allows us to establish the RE condition for several broad classes of random matrices with dependent entries, including random matrices with subgaussian rows and non-trivial covariance structure, as well as matrices with independent rows, and uniformly bounded entries.
-
Toward a Noncommutative Arithmetic-geometric Mean Inequality: Conjectures, Case-studies, and ConsequencesRandomized algorithms that base iteration-level decisions on samples from some pool are ubiquitous in machine learning and optimization. Examples include stochastic gradient descent and randomized coordinate descent. This paper makes progress at theoretically evaluating the difference in performance between sampling with- and without-replacement in such algorithms. Focusing on least means squares optimization, we formulate a noncommutative arithmetic-geometric mean inequality that would prove that the expected convergence rate of without-replacement sampling is faster than that of with-replacement sampling. We demonstrate that this inequality holds for many classes of random matrices and for some pathological examples as well. We provide a deterministic worst-case bound on the gap between the discrepancy between the two sampling models, and explore some of the impediments to proving this inequality in full generality. We detail the consequences of this inequality for stochastic gradient descent and the randomized Kaczmarz algorithm for solving linear systems.
- All Talks
- Unsupervised SVMs: On the Complexity of the Furthest Hyperplane Problem
- (weak) Calibration is Computationally Hard
- Learning Valuation Functions
- Unified Algorithms for Online Learning and Competitive Analysis
- Online Optimization with Gradual Variations
- The Optimality of Jeffreys Prior for Online Density Estimation and the Asymptotic Normality of Maximum Likelihood Estimators
- PAC-Bayesian Bound for Gaussian Process Regression and Multiple Kernel Additive Model
- Random Design Analysis of Ridge Regression
- Reconstruction from Anisotropic Random Measurements
- Toward a Noncommutative Arithmetic-geometric Mean Inequality: Conjectures, Case-studies, and Consequences
- L1 Covering Numbers for Uniformly Bounded Convex Functions
- Generalization Bounds for Online Learning Algorithms with Pairwise Loss Functions
- Attribute-Efficient Learning andWeight-Degree Tradeoffs for Polynomial Threshold Functions
- Learning Functions of Halfspaces Using Prefix Covers
- Computational Bounds on Statistical Query Learning
- Learning DNF Expressions from Fourier Spectrum
- Consistency of Nearest Neighbor Classification under Selective Sampling
- Active Learning Using Smooth Relative Regret Approximations with Applications
- Robust Interactive Learning
- Rare Probability Estimation under Regularly Varying Heavy Tails
- Competitive Classification and Closeness Testing
- Kernels Based Tests with Non-asymptotic Bootstrap Approaches for Two-sample Problems
- Differentially Private Online Learning
- Private Convex Empirical Risk Minimization and High-dimensional Regression
- Distributed Learning, Communication Complexity and Privacy
- A Characterization of Scoring Rules for Linear Properties
- Divergences and Risks for Multiclass Experiments
- A Conjugate Property between Loss Functions and Uncertainty Sets in Classification Problems
- New Bounds for Learning Intervals with Implications for Semi-Supervised Learning
- Tight Bounds on Proper Equivalence Query Learning of DNF
- Distance Preserving Embeddings for General n-Dimensional Manifolds
- A Method of Moments for Mixture Models and Hidden Markov Models
- A Correlation Clustering Approach to Link Classification in Signed Networks
- Spectral Clustering of Graphs with General Degrees in the Extended Planted Partition Model
- Toward Understanding Complex Spaces: Graph Laplacians on Manifolds with Singularities and Boundaries
- Exact Recovery of Sparsely-Used Dictionaries
- Near-Optimal Algorithms for Online Matrix Prediction
- Analysis of Thompson Sampling for the Multi-armed Bandit Problem
- Autonomous Exploration For Navigating In MDPs
- Towards Minimax Policies for Online Linear Optimization with Bandit Feedback
- The Best of Both Worlds: Stochastic and Adversarial Bandits
- Open Problem: Regret Bounds for Thompson Sampling
- Open Problem: Better Bounds for Online Logistic Regression
- Open Problem: Learning Dynamic Network Models from a Static Snapshot
- Open Problem: Does AdaBoost Always Cycle?
- Open Problem: Is Averaging Needed for Strongly Convex Stochastic Gradient Descent?