ICANN 2001, Vienna, Austria, August 21, 2001. Tutorial Slides, Talk Slides

Alex Smola, RSISE, Machine Learning Group, Australian National University, Canberra

## Abstract

Support Vector Machines and related Bayesian kernel methods such as Gaussian Processes or the Relevance Vector Machines have been deployed successfully in classification and regression tasks. They work by mapping the data into a high-dimensional feature space and compute linear functions on the features. This has the appeal of being easily accessible to optimization and theoretical analysis. The algorithmic advantage is that the optimization problems resulting from Support Vector Machines have a global minimum and that they can be solved with standard quadratic programming tools. Furthermore, the parametrization of kernel methods tends to be rather intuitive for the user.

In this tutorial, I will introduce the basic theory of Support Vector Machines and some recent extensions. Moreover, I will present a few simple algorithms to solve the optimization problems in practice.

## Outline

### Linear Estimators

- Discriminant Analysis
- Support Vector Classification
- Least Mean Squares Regression
- Support Vector Regression
- Novelty Detection

### Kernels

- Feature Extraction
- Feature Spaces and Kernels
- Examples of General-Purpose Kernels
- Special Purpose Kernels (Discriminative Models, Texts, Trees, Images)
- Kernels and Regularization
- Test Criteria for Kernels

### Optimization

- Newton’s Method
- Quadratic Optimizers
- Chunking and SMO
- Online Methods

### Bayesian Methods

- Bayesian Basics
- A Gaussian Process View
- Likelihood, Posterior Probabilities and the MAP approximation
- Hyperparameters
- Algorithms