Singapore, October 4-5, 2004 and Kuala Lumpur, October 6-7, 2004. Day 1, Day 2

Alex Smola, RSISE, Machine Learning Group, Australian National University, Canberra

## Day 1

**Lecture 1: Introduction to Machine Learning and Probability Theory**

We introduce the concept of machine learning, as it is used to solve problems of pattern recognition, classification, regression, novelty detection and data cleaning. Subsequently we give a primer on probabilities, Bayes rule and inference (hypothesis testing for disease diagnosis).**Lecture 2: Density Estimation and Parzen Windows**

We begin with a simple density estimator, Parzen Windows, which can be implemented very easily to perform estimation, as it requires essentially no algorithm to run before it can be used. A simple rule is given how to tune the parameters of the estimator, we discuss Silverman’s rule, the Watson-Nadaraya Estimator for classification and regression, we discuss crossvalidation. Examples and applications conclude this lecture.**Lecture 3: The Perceptron and Kernels**

A slightly more complex classifier is the Perceptron which produces linear separation of sets. We explain the algorithm, show its properties and implementation details. Subsequently we modify the algorithm to allow for nonlinear separation and multiclass discrimination. This leads us naturally to introduce kernels. Examples of kernels are given (more details on that on day 2 of the course).**Lecture 4: Support Vector Classification**

Support Vector Machines are a more sophisticated method for solving the classification problem. We describe the optimization problem they solve and show their geometrical properties.

## Day 2

**Lecture 1: Kernel Methods for Text Categorization and Biological Sequence Analysis**

We describe the problem of text categorization, explain kernels on texts and biological sequences and show how they can be computed efficiently. We give practical examples for Remote Homolgy Detection and the Reuters database.**Lecture 2: Optimization**

If one wants to implement SVMs oneself (or also to control them better) one needs to understand how the optimization problem is solved. After a short primer on convex optimization methods we explain chunking and Sequential Minimal Optimization. We conclude with more advanced methods (yet easy to implement), such as online learning with kernels.**Lecture 3: Regression and Novelty Detection**

We continue with a description of methods for regression with kernels, namely the classical SV regression and regularized least mean squares regression. Implementation details are given. Subsequently we discuss kernel methods for novelty detection and database cleaning.**Lecture 4: How to get good results in practice**

Clearly one of the key issues is how to obtain good results in practice. The course concludes with a bag of important practical tricks, such as the nu-trick for adjusting the regularization parameter, the median trick for adjusting the kernel, how to use cross-validation in practice, how to scale the data before optimization, and how to interpret more advanced issues such as the spectrum of the kernel matrix and the smoothness of the kernel itself.

## Prerequisites

Nothing beyond undergraduate knowledge in mathematics is expected. More specifically, I assume:

- Basic linear algebra (matrix inverse, eigenvector, eigenvalue, etc.)
- Some numerical mathematics (beenficial but not required), such as matrix factorization, conditioning, etc.
- Basic statistics and probability theory (Normal distribution, conditional distributions).
- (OPTIONAL:) Some knowledge in Bayesian methods
- (OPTIONAL:) Some knowledge in kernel methods