Please help transcribe this video using our simple transcription tool. You need to be logged in to do so.
In the early days of kernel machines research, the "kernel trick" was considered a useful way of constructing nonlinear learning algorithms from linear ones, by applying the linear algorithms to feature space mappings of the original data. Recently, it has become clear that a potentially more far reaching use of kernels is as a linear way of dealing with higher order statistics, by mapping probabilities to a suitable reproducing kernel Hilbert space (i.e., the feature space is an RKHS). I will describe how probabilities can be mapped to kernel feature spaces, and how to compute the distance between these mappings. This distance is called the maximum mean discrepancy (MMD), and is a metric on distributions for kernels that satisfy the characteristic property. A measure of dependence between two random variables follows naturally from this distance. The focus will be mainly on the application of the MMD to two-sample and independence testing in high dimensional and structured domains. I will also briefly cover embeddings of conditional distributions and their application in inference.
Questions and AnswersYou need to be logged in to be able to post here.