SML: SyllabusSTATISTICS 241B,
COMPUTER SCIENCE 281B
OverviewScalable Machine Learning occurs when Statistics, Systems, Machine Learning and Data Mining are combined into flexible, often nonparametric, and scalable techniques for analyzing large amounts of data at internet scale. This class aims to teach methods which are going to power the next generation of internet applications. The class will cover systems and processing paradigms, an introduction to statistical analysis, algorithms for data streams, generalized linear methods (logistic models, support vector machines, etc.), large scale convex optimization, kernels, graphical models and inference algorithms such as sampling and variational approximations, and explore/exploit mechanisms. Applications include social recommender systems, real time analytics, spam filtering, topic models, and document analysis. Lectures
Syllabus
1. Systems
2. Basic Statistics
3. Data Sketches and Streams
4. Optimization
5. Generalized Linear Models
6. Kernels and Regularization
7. Recommender Systems
8. Midterm Project Presentations
Maximum team size is 4, and a typical team should have 3 members.
Each team gets to pitch their project to the class for 10 minutes and hand in a written documentation of at least 4 and at most 10 pages of a reasonable font size. You should be able to address the following criteria (adapted from
Heilmeier's criteria
for the purpose of this class). This type of reasoning will
help you with choosing your own research agenda, writing grants,
convincing colleagues, securing VC funding, and writing papers.
9. Graphical Models
10. Latent Variable Model Templates
11. Structured Estimation
12. Large Scale Inference in Graphical Models
13. Applications
14. Explore Exploit
15. Final Project PresentationsEach team gets to give a final presentation of their project to the class. This may be as a traditional talk, a demo, a product, an app, or any combination thereof. Make sure you discuss what you're doing, why you're doing it, in which way it is different or better than what's available, and what it is good for. |