TechTalks from event: ICML 2011
BCDNPKL: Scalable Non-Parametric Kernel Learning Using Block Coordinate DescentMost existing approaches for non-parametric kernel learning (NPKL) suffer from expensive computation, which would limit their applications to large-scale problems. To address the scalability problem of NPKL, we propose a novel algorithm called BCDNPKL, which is very efficient and scalable. Superior to most existing approaches, BCDNPKL keeps away from semidefinite programming (SDP) and eigen-decomposition, which benefits from two findings: 1) The original SDP framework of NPKL can be reduced into a far smaller-sized counterpart which is corresponding to the sub-kernel (referred to as boundary kernel) learning; 2) The sub-kernel learning can be efficiently solved by using the proposed block coordinate descent (BCD) technique. We provide a formal proof of global convergence for the proposed BCDNPKL algorithm. The extensive experiments verify the scalability and effectiveness of BCDNPKL, compared with the state-of-the-art algorithms.
Ultra-Fast Optimization Algorithm for Sparse Multi Kernel LearningMany state-of-the-art approaches for Multi Kernel Learning (MKL) struggle at finding a compromise between performance, sparsity of the solution and speed of the optimization process. In this paper we look at the MKL problem at the same time from a learning and optimization point of view. So, instead of designing a regularizer and then struggling to find an efficient method to minimize it, we design the regularizer while keeping the optimization algorithm in mind. Hence, we introduce a novel MKL formulation, which mixes elements of p-norm and elastic-net kind of regularization. We also propose a fast stochastic gradient descent method that solves the novel MKL formulation. We show theoretically and empirically that our method has 1) state-of-the-art performance on many classification tasks; 2) exact sparse solutions with a tunable level of sparsity; 3) a convergence rate bound that depends only logarithmically on the number of kernels used, and is independent of the sparsity required; 4) independence on the particular convex loss function used.
Fast Global Alignment KernelsWe propose novel approaches to cast the widely-used family of Dynamic Time Warping (DTW) distances and similarities as positive definite kernels for time series. To this effect, we provide new theoretical insights on the family of Global Alignment kernels introduced by Cuturi et al. (2007) and propose alternative kernels which are both positive definite and faster to compute. We provide experimental evidence that these alternatives are both faster and more efficient in classification tasks than other kernels based on the DTW formalism.
Mapping kernels for treesWe propose a comprehensive survey of tree kernels through the lens of the mapping kernels framework. We argue that most existing tree kernels, as well as many more that are presented for the first time in this paper, fall into a typology of kernels whose seemingly intricate computation can be efficiently factorized to yield polynomial time algorithms. Despite this fact, we argue that a naive implementation of such kernels remains prohibitively expensive to compute. We propose an approach whereby some computations for smaller trees are cached, which speeds up considerably the computation of all these tree kernels. We provide experimental evidence of this fact as well as preliminary results on the performance of these kernels.