TechTalks from event: ICML 2011

Optimization

  • Fast Newton-type Methods for Total Variation Regularization Authors: Álvaro Barbero; Suvrit Sra
    Numerous applications in statistics, signal processing, and machine learning regularize using Total Variation (TV) penalties. We study anisotropic (l1-based) TV and also a related l2-norm variant. We consider for both variants associated (1D) proximity operators, which lead to challenging optimization problems. We solve these problems by developing Newton-type methods that outperform the state-of-the-art algorithms. More importantly, our 1D-TV algorithms serve as building blocks for solving the harder task of computing 2- (and higher)-dimensional TV proximity. We illustrate the computational benefits of our methods by applying them to several applications: (i) image denoising; (ii) image deconvolution (by plugging in our TV solvers into publicly available software); and (iii) four variants of fused-lasso. The results show large speedups--and to support our claims, we provide software accompanying this paper.
  • The Constrained Weight Space SVM: Learning with Ranked Features Authors: Kevin Small; Byron Wallace; Carla Brodley; Thomas Trikalinos
    Applying supervised learning methods to new classification tasks requires domain experts to label sufficient training data for the classifier to achieve acceptable performance. It is desirable to mitigate this annotation effort. To this end, a pertinent observation is that instance labels are often an indirect form of supervision; it may be more efficient to impart domain knowledge directly to the model in the form of labeled features. We present a novel algorithm for exploiting such domain knowledge which we call the Constrained Weight Space SVM (CW-SVM). In addition to exploiting binary labeled features, our approach allows domain experts to provide ranked features, and, more generally, to express arbitrary expected relationships between sets of features. Our empirical results show that the CW-SVM outperforms both baseline supervised learning strategies and previously proposed methods for learning with labeled features.
  • Size-constrained Submodular Minimization through Minimum Norm Base Authors: Kiyohito Nagano; Yoshinobu Kawahara; Kazuyuki Aihara
    A number of combinatorial optimization problems in machine learning can be described as the problem of minimizing a submodular function. It is known that the unconstrained submodular minimization problem can be solved in strongly polynomial time. However, additional constraints make the problem intractable in many settings. In this paper, we discuss the submodular minimization under a size constraint, which is NP-hard, and generalizes the densest subgraph problem and the uniform graph partitioning problem. Because of NP-hardness, it is difficult to compute an optimal solution even for a prescribed size constraint. In our approach, we do not give approximation algorithms. Instead, the proposed algorithm computes optimal solutions for some of possible size constraints in polynomial time. Our algorithm utilizes the basic polyhedral theory associated with submodular functions. Additionally, we evaluate the performance of the proposed algorithm through computational experiments.
  • Manifold Identification of Dual Averaging Methods for Regularized Stochastic Online Learning Authors: Sangkyun Lee; Stephen Wright
    Iterative methods that take steps in approximate subgradient directions have proved to be useful for stochastic learning problems over large or streaming data sets. When the objective consists of a loss function plus a nonsmooth regularization term, whose purpose is to induce structure (for example, sparsity) in the solution, the solution often lies on a low-dimensional manifold along which the regularizer is smooth. This paper shows that a regularized dual averaging algorithm can identify this manifold with high probability. This observation motivates an algorithmic strategy in which, once a near-optimal manifold is identified, we switch to an algorithm that searches only in this manifold, which typically has much lower intrinsic dimension than the full space, thus converging quickly to a near-optimal point with the desired structure. Computational results are presented to illustrate these claims.