SML: Scalable Machine Learning

STATISTICS 241B, COMPUTER SCIENCE C281B

Practical information

  • Volume: 3 hours per week (3 credits)

  • Time: Tuesday, 4-7pm (3 lectures /in one block)

  • Location: 306 SODA

  • Instructor: Alex Smola (available 1-3pm Tuesdays in Evans 418)

  • TA: Dapo Omidiran

  • Grading Policy: Assignments (40%), Project (50%), Midterm project review (10%), Scribe (Bonus 5%)

  • Piazza discussion board

Updates

  • 041812 New set of assignments is online.

  • 041812 Slides for graphical models are online.

Overview

Scalable Machine Learning occurs when Statistics, Systems, Machine Learning and Data Mining are combined into flexible, often nonparametric, and scalable techniques for analyzing large amounts of data at internet scale. This class aims to teach methods which are going to power the next generation of internet applications. The class will cover systems and processing paradigms, an introduction to statistical analysis, algorithms for data streams, generalized linear methods (logistic models, support vector machines, etc.), large scale convex optimization, kernels, graphical models and inference algorithms such as sampling and variational approximations, and explore/exploit mechanisms. Applications include social recommender systems, real time analytics, spam filtering, topic models, and document analysis.

Resources

Prerequisites

  • Basic probability and statistics. Having attended a machine class would be a big plus but is not absolutely required. Particularly some knowledge of kernels and graphical models would be useful.

  • Basic linear algebra (matrices, vectors, eigenvalues). Knowing functional analysis would be great but not required.

  • Ability to write code that exceeds 'Hello World’. Preferably beyond Matlab or R.

  • Basic knowledge of optimization. Having attended a convex optimization class would be great.