Painless embeddings of distributions: the function space view
Alex Smola, Arthur Gretton, and Kenji Fukumizu, Helsinki, July 5th, 2008Slides
The three parts will be given by Arthur Gretton (Part 1), Alex Smola (Part 2), and Kenji Fukumizu (Part 3),Objective of tutorial
This tutorial will give an introduction to the recent understanding and methodology of the kernel method: dealing with higher order statistics by embedding painlessly random variables/probability distributions.
In the early days of kernel machines research, the "kernel trick" was considered a useful way of constructing nonlinear algorithms from linear ones. More recently, however, it has become clear that a potentially more far reaching use of kernels is as a linear way of dealing with higher order statistics by embedding distributions in a suitable reproducing kernel Hilbert space (RKHS). Notably, unlike the straightforward expansion of higher order moments or conventional characteristic function approach, the use of kernels or RKHS provides a painless, tractable way of embedding distributions.
This line of reasoning leads naturally to the questions: what does it mean to embed a distribution in an RKHS? when is this embedding injective (and thus, when do different distributions have unique mappings)? what implications are there for learning algorithms that make use of these embeddings? This tutorial aims at answering these questions.
There are a great variety of applications in machine learning and computer science, which require distribution estimation and/or comparison. They include
- nonparametric hypothesis testing (of homogeneity, independence, and conditional independence). They can be applied to structured domains, such as strings and graphs.
- independent component analysis and kernel canonical correlation
- causal learning and determining the structure of graphical models
- data set squashing / data sketching / data anonymisation
Speakers and Outline
- Arthur Gretton
(MPI for Biological Cybernetics, Germany)
Fundamentals of distribution embedding and characteristic kernels- Introduction to Distribution Embeddings
- Characteristic kernels and injective embeddings
- Two-sample tests which check whether the difference of embeddings is significant
- Independence tests
- Alexander J. Smola (NICTA, Australia)
Applications of Dependence Tests- Independent Component Analysis
- Feature Selection
- Clustering and Feature Extraction
- Nonparametric Sorting
- Colored Maximum Variance Unfolding
- Kenji Fukumizu
(Institute of Statistical Mathematics, Tokyo)
Conditional covariance, conditional independence, and causality- Conditional covariance on RKHS
- Conditional independence test
- Application to causal inference