Painless Embeddings of Distributions

ICML 2008 Tutorial – The Function Space View

Speakers: Alex Smola, Arthur Gretton, and Kenji Fukumizu

Location: Helsinki, July 5, 2008

Slides

The three parts will be given by Arthur Gretton (Part 1), Alex Smola (Part 2), and Kenji Fukumizu (Part 3).

Objective of Tutorial

This tutorial will give an introduction to the recent understanding and methodology of the kernel method: dealing with higher order statistics by embedding painlessly random variables/probability distributions.

In the early days of kernel machines research, the “kernel trick” was considered a useful way of constructing nonlinear algorithms from linear ones. More recently, however, it has become clear that a potentially more far reaching use of kernels is as a linear way of dealing with higher order statistics by embedding distributions in a suitable reproducing kernel Hilbert space (RKHS). Notably, unlike the straightforward expansion of higher order moments or conventional characteristic function approach, the use of kernels or RKHS provides a painless, tractable way of embedding distributions.

This line of reasoning leads naturally to the questions: what does it mean to embed a distribution in an RKHS? When is this embedding injective (and thus, when do different distributions have unique mappings)? What implications are there for learning algorithms that make use of these embeddings? This tutorial aims at answering these questions.

Applications

There are a great variety of applications in machine learning and computer science, which require distribution estimation and/or comparison. They include:

Nonparametric hypothesis testing (of homogeneity, independence, and conditional independence). They can be applied to structured domains, such as strings and graphs.
Independent component analysis and kernel canonical correlation
Causal learning and determining the structure of graphical models
Data set squashing / data sketching / data anonymisation

Speakers and Outline

Arthur Gretton (MPI for Biological Cybernetics, Germany) Fundamentals of distribution embedding and characteristic kernels
- Introduction to Distribution Embeddings
- Characteristic kernels and injective embeddings
- Two-sample tests which check whether the difference of embeddings is significant
- Independence tests
Alexander J. Smola (NICTA, Australia) Applications of Dependence Tests
- Independent Component Analysis
- Feature Selection
- Clustering and Feature Extraction
- Nonparametric Sorting
- Colored Maximum Variance Unfolding
Kenji Fukumizu (Institute of Statistical Mathematics, Tokyo) Conditional covariance, conditional independence, and causality
- Conditional covariance on RKHS
- Conditional independence test
- Application to causal inference

Relevant Tutorials and Overviews

Alex Smola: “Introduction to Kernel Methods”, MLSS 2007, Tubingen (video)
Arthur Gretton: “Measures of Statistical Dependence”, MLSS 2006, Canberra (video)
Kenji Fukumizu: “Kernel Methods for Dependence and Causality”, MLSS 2007 Tubingen (video)