Introduction to Machine Learning(Slides: Lecture 1, Lecture 2, Lecture 3, Lecture 4, Lecture 5, Lecture 6)
Pune, January 3-7, 2007
National ICT Australia, Statistical Machine Learning Program, Canberra Laboratory
Lecture 1: Introduction to Machine Learning and Probability TheoryWe introduce the concept of machine learning, as it is used to solve problems of pattern recognition, classification, regression, novelty detection and data cleaning. Subsequently we give a primer on probabilities, Bayes rule and inference (hypothesis testing for disease diagnosis).
Lecture 2: Density Estimation and Parzen WindowsWe begin with a simple density estimator, Parzen Windows, which can be implemented very easily to perform estimation, as it requires essentially no algorithm to run before it can be used. A simple rule is given how to tune the parameters of the estimator, we discuss Silverman's rule, the Watson-Nadaraya Estimator for classification and regression, we discuss crossvalidation. Examples and applications conclude this lecture.
Lecture 3: The Perceptron and KernelsA slightly more complex classifier is the Perceptron which produces linear separation of sets. We explain the algorithm, show its properties and implementation details. Subsequently we modify the algorithm to allow for nonlinear separation and multiclass discrimination. This leads us naturally to introduce kernels. Examples of kernels are given (more details on that on day 2 of the course).
Lecture 4: Support Vector ClassificationSupport Vector Machines are a more sophisticated method for solving the classification problem. We describe the optimization problem they solve and show their geometrical properties.
Lecture 5: Regression and Novelty DetectionWe continue with a description of methods for regression with kernels, namely the classical SV regression and regularized least mean squares regression. Implementation details are given. Subsequently we discuss kernel methods for novelty detection and database cleaning.
Lecture 6: Structured EstimationWe introduce structured estimation which allows us to extend SVM optimization to problems where the labels have a rich inherent structure, such as multiclass, sequence annotation, or web page ranking. We discuss the two latter subjects in detail, focusing on dynamic programming and linear assignment problems.
Nothing beyond undergraduate knowledge in mathematics is expected. Lecture 6 is likely to be somewhat more difficult, though. More specifically, I assume:
- Basic linear algebra (matrix inverse, eigenvector, eigenvalue, etc.)
- Some numerical mathematics (beenficial but not required), such as matrix factorization, conditioning, etc.
- Basic statistics and probability theory (Normal distribution, conditional distributions).