Optimization (Recitation)

Introduction to Machine Learning - 10-701/15-781

Content

Unconstrained problems
- Gradient descent
- Newton's method
- Conjugate gradient descent
- Broden-Fletcher-Goldfarb-Shanno (BFGS)
Convexity
- Properties
- Lagrange function
- Wolfe dual
Batch methods
- Distributed subgradient
- Bundle methods
Online methods
- Unconstrained subgradient
- Gradient projections
- Parallel optimization

Supplementary material

PDF slides in for Stochastic Gradient Descent and Quadratic Programing. If you want to extract the equations from the slides you can do so by using LaTeXit, simply by dragging the equation images into it.

Boyd and Vandenberghe book (the default reference for convex optimization)
Submodular optimization and applications site
Nesterov and Vial paper on expected convergence
Bartlett, Hazan, Rakhlin paper which uses strong convexity.
TAO (Toolkit for advanced optimization) site
Ratliff, Bagnell, Zinkevich regret proof
Shalev-Shwartz, Srebro, Singer Pegasos
Langford, Smola, Zinkevich proof of multicore convergence
Recht, Wright, Re proof of asynchronous updates in Hogwild

Videos

Unedited video straight from a GF2.