TechTalks from event: ICML 2011

Kernel Methods and Optimization

  • Learning Output Kernels with Block Coordinate Descent Authors: Francesco Dinuzzo; Cheng Soon Ong; Peter Gehler; Gianluigi Pillonetto
    We propose a method to learn simultaneously a vector-valued function and a kernel between its components. The obtained kernel can be used both to improve learning performances and to reveal structures in the output space which may be important in their own right. Our method is based on the solution of a suitable regularization problem over a reproducing kernel Hilbert space (RKHS) of vector-valued functions. Although the regularized risk functional is non-convex, we show that it is invex, implying that all local minimizers are global minimizers. We derive a block-wise coordinate descent method that efficiently exploits the structure of the objective functional. Then, we empirically demonstrate that the proposed method can improve classification accuracy. Finally, we provide a visual interpretation of the learned kernel matrix for some well known datasets.
  • Implementing regularization implicitly via approximate eigenvector computation Authors: Michael Mahoney; Lorenzo Orecchia
    Regularization is a powerful technique for extracting useful information from noisy data. Typically, it is implemented by adding some sort of norm constraint to an objective function and then exactly optimizing the modified objective function. This procedure often leads to optimization problems that are computationally more expensive than the original problem, a fact that is clearly problematic if one is interested in large-scale applications. On the other hand, a large body of empirical work has demonstrated that heuristics, and in some cases approximation algorithms, developed to speed up computations sometimes have the side-effect of performing regularization implicitly. Thus, we consider the question: What is the regularized optimization objective that an approximation algorithm is exactly optimizing? We address this question in the context of computing approximations to the smallest nontrivial eigenvector of a graph Laplacian; and we consider three random-walk-based procedures: one based on the heat kernel of the graph, one based on computing the the PageRank vector associated with the graph, and one based on a truncated lazy random walk. In each case, we provide a precise characterization of the manner in which the approximation method can be viewed as implicitly computing the exact solution to a regularized problem. Interestingly, the regularization is not on the usual vector form of the optimization problem, but instead it is on a related semidefinite program.
  • Adaptive Kernel Approximation for Large-Scale Non-Linear SVM Prediction Authors: Michele Cossalter; Rong Yan; Lu Zheng
    The applicability of non-linear support vector machines (SVMs) has been limited in large-scale data collections because of their linear prediction complexity to the size of support vectors. We propose an efficient prediction algorithm with performance guarantee for non-linear SVMs, termed AdaptSVM. It can selectively collapse the kernel function computation to a reduced set of support vectors, compensated by an additional correction term that can be easily computed on-line. It also allows adaptive fall-back to original kernel computation based on its estimated variance and maximum error tolerance. In addition to theoretical analysis, we empirically evaluate on multiple large-scale datasets to show that the proposed algorithm can speed up the prediction process up to 10000 times with only <0.5 accuracy loss.
  • Suboptimal Solution Path Algorithm for Support Vector Machine Authors: Masayuki Karasuyama; Ichiro Takeuchi
    We consider a suboptimal solution path algorithm for the Support Vector Machine. The solution path algorithm is known as an effective tool for solving a sequence of a parametrized optimization problems in machine learning. However, the algorithm needs to keep strict optimality conditions satisfied everywhere on the path. This requirement narrows the applicability of the path algorithm and adversely affects its computational efficiency. In our algorithm, user can specify tolerances to the optimality and control the trade-off between accuracy of the solution and the computational cost. We also show that our suboptimal solutions can be interpreted as the solution of a perturbed optimization problem from the original one, provide some theoretical analyses of our algorithm based on a novel interpretation. The experimental results demonstrate the effectiveness of our algorithm in terms of efficiency and accuracy.
  • Functional Regularized Least Squares Classication with Operator-valued Kernels Authors: Hachem Kadri; Asma Rabaoui; Philippe Preux; Emmanuel Duflos; Alain Rakotomamonjy