TechTalks from event: ICML 2011

Sparsity and Compressed Sensing

  • Efficient Sparse Modeling with Automatic Feature Grouping Authors: Wenliang Zhong; James Kwok
    The grouping of features is highly beneficial in learning with high-dimensional data. It reduces the variance in the estimation and improves the stability of feature selection, leading to improved generalization. Moreover, it can also help in data understanding and interpretation. OSCAR is a recent sparse modeling tool that achieves this by using a $ell_1$-regularizer and a pairwise $ell_infty$-regularizer. However, its optimization is computationally expensive. In this paper, we propose an efficient solver based on the accelerated gradient methods. We show that its key projection step can be solved by a simple iterative group merging algorithm. It is highly efficient and reduces the empirical time complexity from $O(d^3 sim d^5)$ for the existing solvers to just $O(d)$, where $d$ is the number of features. Experimental results on toy and real-world data sets demonstrate that OSCAR is a competitive sparse modeling approach with the added ability of automatic feature grouping.
  • Robust Matrix Completion and Corrupted Columns Authors: Yudong Chen; Huan Xu; Constantine Caramanis; Sujay Sanghavi
    This paper considers the problem of matrix completion, when some number of the columns are arbitrarily corrupted. It is well-known that standard algorithms for matrix completion can return arbitrarily poor results, if even a single column is corrupted. What can be done if a large number, or even a constant fraction of columns are corrupted? In this paper, we study this very problem, and develop an robust and efficient algorithm for its solution. One direct application comes from robust collaborative filtering. Here, some number of users are so-called manipulators, and try to skew the predictions of the algorithm. Significantly, our results hold {it without any assumptions on the observed entries of the manipulated columns}.
  • Clustering Partially Observed Graphs via Convex Optimization Authors: Ali Jalali; Yudong Chen; Sujay Sanghavi; Huan Xu
    This paper considers the problem of clustering a partially observed unweighted graph -- i.e. one where for some node pairs we know there is an edge between them, for some others we know there is no edge, and for the remaining we do not know whether or not there is an edge. We want to organize the nodes into disjoint clusters so that there is relatively dense (observed) connectivity within clusters, and sparse across clusters. We take a novel yet natural approach to this problem, by focusing on finding the clustering that minimizes the number of "disagreements" - i.e. the sum of the number of (observed) missing edges within clusters, and (observed) present edges across clusters. Our algorithm uses convex optimization; its basis is a reduction of disagreement minimization to the problem of recovering an (unknown) low-rank matrix and an (unknown) sparse matrix from their partially observed sum. We show that our algorithm succeeds under certain natural assumptions on the optimal clustering and its disagreements. Our results significantly strengthen existing matrix splitting results for the special case of our clustering problem. Our results directly enhance solutions to the problem of Correlation Clustering with partial observations
  • Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions Authors: Alekh Agarwal; Sahand Negahban; Martin Wainwright
    We analyze a class of estimators based on a convex relaxation for solving high-dimensional matrix decomposition problems. The observations are the noisy realizations of the sum of an (approximately) low rank matrix $Theta^star$ with a second matrix $Gamma^star$ endowed with a complementary form of low-dimensional structure. We derive a general theorem that gives upper bounds on the Frobenius norm error for an estimate of the pair $(Theta^star, Gamma^star)$ obtained by solving a convex optimization problem. We then specialize our general result to two cases that have been studied in the context of robust PCA: low rank plus sparse structure, and low rank plus a column sparse structure. Our theory yields Frobenius norm error bounds for both deterministic and stochastic noise matrices, and in the latter case, they are minimax optimal. The sharpness of our theoretical predictions is also confirmed in numerical simulations.
  • Submodular meets Spectral: Greedy Algorithms for Subset Selection, Sparse Approximation and Dictionary Selection Authors: Abhimanyu Das; David Kempe
    We study the problem of selecting a subset of $k$ random variables from a large set, in order to obtain the best linear prediction of another variable of interest. This problem can be viewed in the context of both feature selection and sparse approximation. We analyze the performance of widely used greedy heuristics, using insights from the maximization of submodular functions and spectral analysis. We introduce the submodularity ratio as a key quantity to help understand why greedy algorithms perform well even when the variables are highly correlated. Using our techniques, we obtain the strongest known approximation guarantees for this problem, both in terms of the submodularity ratio and the smallest $k$-sparse eigenvalue of the covariance matrix. We also analyze greedy algorithms for the dictionary selection problem, and significantly improve the previously known guarantees. Our theoretical analysis is complemented by experiments on real-world and synthetic data sets; the experiments show that the submodular ratio is a stronger predictor of the performance of greedy algorithms than other spectral parameters.