Schedule and Abstracts
7:45–8:00 Overview by Organizers
8:00–8:30 Invited Talk: Kernel Bayes Rule
Kenji Fukumizu, The Institute of Statistical Mathematics
A nonparametric kernel-based method for realizing Bayes’ rule is proposed, based on representations of probabilities in reproducing kernel Hilbert spaces. Probabilities are uniquely characterized by the mean of the canonical map to the RKHS. The prior and conditional probabilities are expressed in terms of RKHS functions of an empirical sample: no explicit parametric model is needed for these quantities. The posterior is likewise an RKHS mean of a weighted sample. The estimator for the expectation of a function of the posterior is derived, and rates of consistency are shown. Some representative applications of the kernel Bayes’ rule are presented, including Bayesian computation without likelihood and filtering with a nonparametric state-space model.
8:30–8:40 Talk: FastFood: Approximating Kernel Expansion in Loglinear Time
Alex Smola, Research at Google
The ability to evaluate nonlinear function classes rapidly is crucial for nonparametric estimation. We propose an improvement to random kitchen sinks that offers O(n log d) computation and O(n) storage for n basis functions in d dimensions without sacrificing accuracy. We show how one may adjust the regularization properties of the kernel simply by changing the spectral distribution of the projection matrix. Experiments show that we achieve identical accuracy to full kernel expansions and random kitchen sinks 100x faster with 1000x less memory.
8:40–9:00 Coffee Break
9:00–9:30 Invited Talk: Small-Variance Asymptotics, Nonparametric Bayes, and Kernel K-means
Brian Kulis, Ohio State University
It is well known that mixture-of-Gaussians and k-means are related through asymptotics on the variance of the clusters—as the variance tends to zero, the EM algorithm becomes the k-means algorithm, and the complete-data log likelihood becomes the k-means objective function. As shown recently, such asymptotics can also be applied to Bayesian nonparametric models, leading to simple and scalable k-means-like algorithms for a host of problems including clustering, latent feature models, topic models, and others. In this talk, I will overview these results, with a focus on the connections to kernel methods. In particular, I will discuss how an existing equivalence between kernel k-means and graph clustering can be used in conjunction with the asymptotics of Bayesian nonparametric models to obtain a class of novel and scalable kernel-based algorithms for problems such as overlapping graph clustering and graph clustering when the number of clusters is not fixed.
9:30–10:00 Invited Talk: Kernel Methods in Nonparametric Bayesian Models
Lawrence Carin, Duke University
For handling large-scale problems, methods like Gaussian processes can be computationally challenging. In this paper, we discuss how use of alternative kernel methods can be employed to accelerate computations, without loss of modeling power. We examine this in the context of general nonparametric Bayesian models, with specific applications within the Beta process. The theoretical and algorithmic issues are discussed, with demonstration via several examples.
10:00–10:15 Contributed Talk: Kernel Embeddings of Dirichlet Process Mixtures
Krikamol Muandet, Max Planck Institute of Biological Cybernetics
10:15–16:00 Break
16:00–16:10 Contributed Talk: Kernel Methods for Learning Motion Patterns
Lachlan McCalman, University of Sydney
16:10–16:20 Contributed Talk: Kernels for Protein Structure Prediction
Narges Razavian, Carnegie Mellon University
16:20–16:30 Short Coffee Break
16:30–17:00 Invited Talk: Kernel Topic Models
Thore Graepel, Microsoft Research Cambridge
Latent Dirichlet Allocation models discrete data as a mixture of discrete distributions, using Dirichlet beliefs over the mixture weights. We study a variation of this concept, in which the documents’ mixture weight beliefs are replaced with squashed Gaussian distributions. This allows documents to be associated with elements of a Hilbert space, admitting kernel topic models (KTM), modelling temporal, spatial, hierarchical, social and other structure between documents. The main challenge is efficient approximate inference on the latent Gaussian. We present an approximate algorithm cast around a Laplace approximation in a transformed basis. The KTM can also be interpreted as a type of Gaussian process latent variable model, or as a topic model conditional on document features, uncovering links between earlier work in these areas. This is joint work with Philipp Hennig (first author), David Stern, and Ralf Herbrich.
17:00–17:30 Invited Talk: Nonparametric Variational Inference
Matt Hoffman, Adobe
Variational methods are widely used for approximate posterior inference. However, their use is typically limited to families of distributions that enjoy particular conjugacy properties. To circumvent this limitation, we propose a family of variational approximations inspired by nonparametric kernel density estimation. The locations of these kernels and their bandwidth are treated as variational parameters and optimized to improve an approximate lower bound on the marginal likelihood of the data. Unlike most other variational approximations, using multiple kernels allows the approximation to capture multiple modes of the posterior. We demonstrate the efficacy of the nonparametric approximation with a hierarchical logistic regression model and a nonlinear matrix factorization model. We obtain predictive performance as good as or better than more specialized variational methods and MCMC approximations. The method is easy to apply to graphical models for which standard variational methods are difficult to derive.
17:30–18:00 Coffee Break
18:00–18:30 Invited Talk: Determinantal Point Processes
Ben Taskar, University of Pennsylvania
Determinantal point processes (DPPs) arise in random matrix theory and quantum physics as models of random variables with negative correlations. Among many remarkable properties, they offer tractable algorithms for exact inference, including computing marginals, computing certain conditional probabilities, and sampling. DPPs are a natural model for subset selection problems where diversity is preferred. For example, they can be used to select diverse sets of sentences to form document summaries, or to return relevant but varied text and image search results, or to detect non-overlapping multiple object trajectories in video. In our recent work, we discovered a novel factorization and dual representation of DPPs that enables efficient inference for exponentially-sized structured sets. We developed a new inference algorithm based on Newton identities for DPPs conditioned on subset size. We also derived efficient parameter estimation for DPPs from several types of observations.
18:30–19:00 Invited Talk: Connection between Kernel Embedding and Bayesian/Gaussian Process Methods
David Duvenaud, Cambridge University