TechTalks from event: ICML 2011

Optimization

  • Fast Newton-type Methods for Total Variation Regularization Authors: Álvaro Barbero; Suvrit Sra
    Numerous applications in statistics, signal processing, and machine learning regularize using Total Variation (TV) penalties. We study anisotropic (l1-based) TV and also a related l2-norm variant. We consider for both variants associated (1D) proximity operators, which lead to challenging optimization problems. We solve these problems by developing Newton-type methods that outperform the state-of-the-art algorithms. More importantly, our 1D-TV algorithms serve as building blocks for solving the harder task of computing 2- (and higher)-dimensional TV proximity. We illustrate the computational benefits of our methods by applying them to several applications: (i) image denoising; (ii) image deconvolution (by plugging in our TV solvers into publicly available software); and (iii) four variants of fused-lasso. The results show large speedups--and to support our claims, we provide software accompanying this paper.
  • The Constrained Weight Space SVM: Learning with Ranked Features Authors: Kevin Small; Byron Wallace; Carla Brodley; Thomas Trikalinos
    Applying supervised learning methods to new classification tasks requires domain experts to label sufficient training data for the classifier to achieve acceptable performance. It is desirable to mitigate this annotation effort. To this end, a pertinent observation is that instance labels are often an indirect form of supervision; it may be more efficient to impart domain knowledge directly to the model in the form of labeled features. We present a novel algorithm for exploiting such domain knowledge which we call the Constrained Weight Space SVM (CW-SVM). In addition to exploiting binary labeled features, our approach allows domain experts to provide ranked features, and, more generally, to express arbitrary expected relationships between sets of features. Our empirical results show that the CW-SVM outperforms both baseline supervised learning strategies and previously proposed methods for learning with labeled features.
  • Size-constrained Submodular Minimization through Minimum Norm Base Authors: Kiyohito Nagano; Yoshinobu Kawahara; Kazuyuki Aihara
    A number of combinatorial optimization problems in machine learning can be described as the problem of minimizing a submodular function. It is known that the unconstrained submodular minimization problem can be solved in strongly polynomial time. However, additional constraints make the problem intractable in many settings. In this paper, we discuss the submodular minimization under a size constraint, which is NP-hard, and generalizes the densest subgraph problem and the uniform graph partitioning problem. Because of NP-hardness, it is difficult to compute an optimal solution even for a prescribed size constraint. In our approach, we do not give approximation algorithms. Instead, the proposed algorithm computes optimal solutions for some of possible size constraints in polynomial time. Our algorithm utilizes the basic polyhedral theory associated with submodular functions. Additionally, we evaluate the performance of the proposed algorithm through computational experiments.
  • Manifold Identification of Dual Averaging Methods for Regularized Stochastic Online Learning Authors: Sangkyun Lee; Stephen Wright
    Iterative methods that take steps in approximate subgradient directions have proved to be useful for stochastic learning problems over large or streaming data sets. When the objective consists of a loss function plus a nonsmooth regularization term, whose purpose is to induce structure (for example, sparsity) in the solution, the solution often lies on a low-dimensional manifold along which the regularizer is smooth. This paper shows that a regularized dual averaging algorithm can identify this manifold with high probability. This observation motivates an algorithmic strategy in which, once a near-optimal manifold is identified, we switch to an algorithm that searches only in this manifold, which typically has much lower intrinsic dimension than the full space, thus converging quickly to a near-optimal point with the desired structure. Computational results are presented to illustrate these claims.

Learning Theory

  • Multiple Instance Learning with Manifold Bags Authors: Boris Babenko; Nakul Verma; Piotr Dollar; Serge Belongie
    In many machine learning applications, labeling every instance of data is burdensome. Multiple Instance Learning (MIL), in which training data is provided in the form of labeled bags rather than labeled instances, is one approach for a more relaxed form of supervised learning. Though much progress has been made in analyzing MIL problems, existing work considers bags that have a finite number of instances. In this paper we argue that in many applications of MIL (e.g. image, audio, e.t.c.) the bags are better modeled as low dimensional manifolds in high dimensional feature space. We show that the geometric structure of such manifold bags affects PAC-learnability. We discuss how a learning algorithm that is designed for finite sized bags can be adapted to learn from manifold bags. Furthermore, we propose a simple heuristic that reduces the memory requirements of such algorithms. Our experiments on real-world data validate our analysis and show that our approach works well.
  • Minimax Learning Rates for Bipartite Ranking and Plug-in Rules Authors: Sylvain Robbiano; Stéphan Clémençon
    While it is now well-known in the standard binary classi cation setup, that, under suitable margin assumptions and complexity conditions on the regression function, fast or even super-fast rates (i.e. rates faster than n^(-1/2) or even faster than n^-1) can be achieved by plug-in classi ers, no result of this nature has been proved yet in the context of bipartite ranking, though akin to that of classi cation. It is the main purpose of the present paper to investigate this issue. Viewing bipartite ranking as a nested continuous collection of cost-sensitive classi cation problems, we exhibit a global low noise condition under which certain plug-in ranking rules can be shown to achieve fast (but not super-fast) rates, establishing thus minimax upper bounds for the excess of ranking risk.
  • From PAC-Bayes Bounds to Quadratic Programs for Majority Votes Authors: Jean-Francis Roy; Francois Laviolette; Mario Marchand
    We propose to construct a weighted majority vote on a set of basis functions by minimizing a risk bound (called the C-bound) that depends on the first two moments of the margin of the Q-convex combination realized on the training data. This bound minimization algorithm turns out to be a quadratic program that can be efficiently solved. A first version of the algorithm is designed for the supervised inductive setting and turns out to be competitive with AdaBoost, MDBoost and the SVM. The second version of the algorithm, designed for the transductive setting, competes well with TSVM. We also propose a new PAC-Bayes theorem that bounds the difference between the "true" value of the C-bound and its empirical estimate and that, unexpectedly, contains no KL-divergence.

Invited Cross-Conference Session