ICML 2011
TechTalks from event: ICML 2011
Sparsity and Compressed Sensing

Efficient Sparse Modeling with Automatic Feature GroupingThe grouping of features is highly beneficial in learning with highdimensional data. It reduces the variance in the estimation and improves the stability of feature selection, leading to improved generalization. Moreover, it can also help in data understanding and interpretation. OSCAR is a recent sparse modeling tool that achieves this by using a $ell_1$regularizer and a pairwise $ell_infty$regularizer. However, its optimization is computationally expensive. In this paper, we propose an efficient solver based on the accelerated gradient methods. We show that its key projection step can be solved by a simple iterative group merging algorithm. It is highly efficient and reduces the empirical time complexity from $O(d^3 sim d^5)$ for the existing solvers to just $O(d)$, where $d$ is the number of features. Experimental results on toy and realworld data sets demonstrate that OSCAR is a competitive sparse modeling approach with the added ability of automatic feature grouping.

Robust Matrix Completion and Corrupted ColumnsThis paper considers the problem of matrix completion, when some number of the columns are arbitrarily corrupted. It is wellknown that standard algorithms for matrix completion can return arbitrarily poor results, if even a single column is corrupted. What can be done if a large number, or even a constant fraction of columns are corrupted? In this paper, we study this very problem, and develop an robust and efficient algorithm for its solution. One direct application comes from robust collaborative filtering. Here, some number of users are socalled manipulators, and try to skew the predictions of the algorithm. Significantly, our results hold {it without any assumptions on the observed entries of the manipulated columns}.

Clustering Partially Observed Graphs via Convex OptimizationThis paper considers the problem of clustering a partially observed unweighted graph  i.e. one where for some node pairs we know there is an edge between them, for some others we know there is no edge, and for the remaining we do not know whether or not there is an edge. We want to organize the nodes into disjoint clusters so that there is relatively dense (observed) connectivity within clusters, and sparse across clusters. We take a novel yet natural approach to this problem, by focusing on finding the clustering that minimizes the number of "disagreements"  i.e. the sum of the number of (observed) missing edges within clusters, and (observed) present edges across clusters. Our algorithm uses convex optimization; its basis is a reduction of disagreement minimization to the problem of recovering an (unknown) lowrank matrix and an (unknown) sparse matrix from their partially observed sum. We show that our algorithm succeeds under certain natural assumptions on the optimal clustering and its disagreements. Our results significantly strengthen existing matrix splitting results for the special case of our clustering problem. Our results directly enhance solutions to the problem of Correlation Clustering with partial observations

Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensionsWe analyze a class of estimators based on a convex relaxation for solving highdimensional matrix decomposition problems. The observations are the noisy realizations of the sum of an (approximately) low rank matrix $Theta^star$ with a second matrix $Gamma^star$ endowed with a complementary form of lowdimensional structure. We derive a general theorem that gives upper bounds on the Frobenius norm error for an estimate of the pair $(Theta^star, Gamma^star)$ obtained by solving a convex optimization problem. We then specialize our general result to two cases that have been studied in the context of robust PCA: low rank plus sparse structure, and low rank plus a column sparse structure. Our theory yields Frobenius norm error bounds for both deterministic and stochastic noise matrices, and in the latter case, they are minimax optimal. The sharpness of our theoretical predictions is also confirmed in numerical simulations.

Submodular meets Spectral: Greedy Algorithms for Subset Selection, Sparse Approximation and Dictionary SelectionWe study the problem of selecting a subset of $k$ random variables from a large set, in order to obtain the best linear prediction of another variable of interest. This problem can be viewed in the context of both feature selection and sparse approximation. We analyze the performance of widely used greedy heuristics, using insights from the maximization of submodular functions and spectral analysis. We introduce the submodularity ratio as a key quantity to help understand why greedy algorithms perform well even when the variables are highly correlated. Using our techniques, we obtain the strongest known approximation guarantees for this problem, both in terms of the submodularity ratio and the smallest $k$sparse eigenvalue of the covariance matrix. We also analyze greedy algorithms for the dictionary selection problem, and significantly improve the previously known guarantees. Our theoretical analysis is complemented by experiments on realworld and synthetic data sets; the experiments show that the submodular ratio is a stronger predictor of the performance of greedy algorithms than other spectral parameters.
 All Sessions
 Keynotes
 Bandits and Online Learning
 Structured Output
 Reinforcement Learning
 Graphical Models and Optimization
 Recommendation and Matrix Factorization
 Neural Networks and Statistical Methods
 LatentVariable Models
 LargeScale Learning
 Learning Theory
 Feature Selection, Dimensionality Reduction
 Invited CrossConference Track
 Neural Networks and Deep Learning
 LatentVariable Models
 Active and Online Learning
 Tutorial : Collective Intelligence and Machine Learning
 Tutorial: Machine Learning in Ecological Science and Environmental Policy
 Tutorial: Machine Learning and Robotics
 Tutorial: Introduction to Bandits: Algorithms and Theory
 Ensemble Methods
 Tutorial: Machine Learning for Large Scale Recommender Systems
 Tutorial: Learning Kernels
 TestofTime
 Best Paper
 Robotics and Reinforcement Learning
 Transfer Learning
 Kernel Methods
 Optimization
 Learning Theory
 Invited CrossConference Session
 Neural Networks and Deep Learning
 Reinforcement Learning
 Bayesian Inference and Probabilistic Models
 Supervised Learning
 Social Networks
 Evaluation Metrics
 statistical relational learning
 Outlier Detection
 Time Series
 Graphical Models and Bayesian Inference
 Sparsity and Compressed Sensing
 Clustering
 Game Theory and Planning and Control
 SemiSupervised Learning
 Kernel Methods and Optimization
 Neural Networks and NLP
 Probabilistic Models & MCMC
 Online Learning
 Ranking and Information Retrieval