Machine Learning Summer School 2002

Abstract

The course begins with an overview of the basic assumptions underlying Bayesian estimation. We explain the notion of prior distributions, which encode our prior belief concerning the likelihood of obtaining a certain estimate, and the concept of the posterior probability, which quantifies how plausible functions appear after we observe some data. Subsequently we show how inference is performed, and how certain numerical problems that arise can be alleviated by various types of Maximum-a-Posteriori (MAP) estimation.

Once the basic tools are introduced, we analyze the specific properties of Bayesian estimators for three different types of prior probabilities: Gaussian Processes (which includes a description of the theory and efficient means of implementation), which rely on the assumption that adjacent coefficients are correlated, Laplacian Processes, which assume that estimates can be expanded into a sparse linear combination of kernel functions, and therefore favor such hypotheses, and Relevance Vector Machines, which assume that the contribution of each kernel function is governed by a normal distribution with its own variance.

Prerequisites

  • Elementary Linear Algebra
  • Calculus
  • Experience with Bayesian Methods is beneficial, however not required.
  • Experience with Kernel Methods is likewise beneficial, but not required.

Contents

  • Unit 1: Bayes Rule, Approximate Inference, Hyperparameters (more slides)
  • Unit 2: Gaussian Processes, Covariance Function, Kernel
  • Unit 3: GP: Regression
  • Unit 4: GP: Classification
  • Unit 5: Implementation: Laplace Approximation, Low Rank Methods
  • Unit 6: Implementation: Low Rank Methods, Bayes Committee Machine
  • Unit 7: Relevance Vector Machine: Priors on Coefficients
  • Unit 8: Relevance Vector Machine: Efficient Optimization and Extensions
  • Lab