Dive into Deep Learning

Aston Zhang, Zachary Lipton, Mu Li and Alex Smola, 2022

This book covers code, math, examples and explanations in one piece. Some of the highlights:

  • Downloadable Jupyter notebooks. In fact, the entire book consists of notebooks.
  • A freely available PDF version
  • A GitHub repository to allow for fast corrections of errata
  • A tight integration with discussion forums to allow for questions regarding the math and code on the site
  • Theoretical background suitable for engineers and undergraduate researchers
  • State of the art models (including ResNet, faster-RCNN, etc)
  • Well documented and structured code that is executed on real datasets, yet at the same time small enough to fit on a laptop.
  • A Chinese translation (in fact, the Chinese book will be released first)

Predicting Structured Data

Edited by Gökhan H. Bakir, Thomas Hofmann, Bernhard Schölkopf, Alexander J. Smola, Ben Taskar and S. V. N. Vishwanathan, MIT Press, 2006.

Machine learning develops intelligent computer systems that are able to generalize from previously seen examples. A new domain of machine learning, in which the prediction must satisfy the additional constraints found in structured data, poses one of machine learning’s greatest challenges: learning functional dependencies between arbitrary input and output domains. This volume presents and analyzes the state of the art in machine learning algorithms and theory in this novel field. The contributors discuss applications as diverse as machine translation, document markup, computational biology, and information extraction, among others, providing a timely overview of an exciting field.


Yasemin Altun, Gökhan Bakir, Olivier Bousquet, Sumit Chopra, Corinna Cortes, Hal Daume III, Ofer Dekel, Zoubin Ghahramani, Raia Hadsell, Thomas Hofmann, Fu Jie Huang, Yann LeCun, Tobias Mann, Daniel Marcu, David McAllester, Mehryar Mohri, William Stafford Noble, Fernando Perez-Cruz, Massimiliano Pontil, Marc’Aurelio Ranzato, Juho Rousu, Craig Saunders, Bernhard Schölkopf, Matthias W. Seeger, Shai Shalev-Shwartz, John Shawe-Taylor, Yoram Singer, Alexander J. Smola, Sandor Szedmak, Ben Taskar, Ioannis Tsochantaridis, S. V. N. Vishwanathan, and Jason Weston.

Proceedings of the Machine Learning Summer School 2002

Edited by Shahar Mendelson and Alexander J. Smola, Springer Verlag, LNCS 2600, 2003.

This book contains a collection of the main talks held at the Machine Learning Sumer School at the Australian National University on February 11-22, 2002. It contains tutorial chapters on topics such as Boosting, Data Mining, Kernel Methods, Logic, Reinforcement Learning, and Statistical Learning Theory. The papers provide an in-depth overview of these exciting new areas, contain a large set of references, and thereby provide the interested reader with further information to start or to pursue his own research in these directions.


Peter Bartlett, Markus Hegland, John Lloyd, Jyrki Kivinen, Gunnar Rätsch, Ron Meir, Shahar Mendelson, Bernhard Schölkopf and Alexander Smola.

Learning with Kernels

Support Vector Machines, Regularization, Optimization, and Beyond Bernhard Schölkopf and Alexander J. Smola, MIT Press, 2002.

In the 1990s, a new type of learning algorithm was developed, based on results from statistical learning theory: the Support Vector Machine (SVM). This gave rise to a new class of theoretically elegant learning machines that use a central concept of SVMs - kernels - for a number of learning tasks. Kernel machines provide a modular framework that can be adapted to different tasks and domains by the choice of the kernel function and the base algorithm. They are replacing neural networks in a variety of fields, including engineering, information retrieval, and bioinformatics. Learning with Kernels provides an introduction to SVMs and related kernel methods. Although the book begins with the basics, it also includes the latest research. It provides all of the concepts necessary to enable a reader equipped with some basic mathematical knowledge to enter the world of machine learning using theoretically well-founded yet easy-to-use kernel algorithms and to understand and apply the powerful algorithms that have been developed over the last few years.

Advances in Large-Margin Classifiers

Edited by Peter J. Bartlett, Bernhard Schölkopf, Dale Schuurmans, and Alexander J. Smola, MIT Press, 2001.

The concept of large margins is a unifying principle for the analysis of many different approaches to the classification of data from examples, including boosting, mathematical programming, neural networks, and support vector machines. The fact that it is the margin, or confidence level, of a classification–that is, a scale parameter–rather than a raw training error that matters has become a key tool for dealing with classifiers. This book shows how this idea applies to both the theoretical analysis and the design of algorithms.


Olivier Chapelle, Nello Cristianini, Rainer Dietrich, Andre Elisseeff, Theodoros Evgeniou, Thore Graepel, Isabelle Guyon, Ralf Herbrich, Adam Kowalcyzk, Yi Lin, Olvi Mangasarian, Mario Marchand, Klaus Obermayer, Nuria Oliver, Manfred Opper, Tomaso Poggio, Massimiliano Pontil, Pal Rujan, Bernhard Schölkopf, John Shawe-Taylor, Alex Smola, Haim Sompolinsky, David Stork, Grace Wahba, Chris Watkins, Jason Weston, Robert Williamson, Ole Winther, Vladimir Vapnik, Hao Zhang.

Advances in Kernel Methods - Support Vector Learning

Edited by Chris Burges, Bernhard Schölkopf and Alexander J. Smola, MIT Press, 1998.

The Support Vector Machine is a powerful new learning algorithm for solving a variety of learning and function estimation problems, such as pattern recognition, regression estimation, and operator inversion. The impetus for this collection was a workshop on Support Vector Machines held at the 1997 NIPS conference.


Peter Bartlett, Kristin P. Bennett, Christopher J. C. Burges, Nello Cristianini, Alex Gammerman, Federico Girosi, Simon Haykin, Thorsten Joachims, Linda Kaufman, Jens Kohlmorgen, Ulrich Kressel, Davide Mattera, Klaus-Robert Müller, Manfred Opper, Edgar E. Osuna, John C. Platt, Gunnar Rätsch, Bernhard Schölkopf, John Shawe-Taylor, Alexander J. Smola, Mark O. Stitson, Vladimir Vapnik, Volodya Vovk, Grace Wahba, Chris Watkins, Jason Weston, Robert C. Williamson.



Latest Release Continuous Integration Platform Tests Python Versions Twitter

AutoGluon automates machine learning tasks enabling you to easily achieve strong predictive performance in your applications. With just a few lines of code, you can train and deploy high-accuracy machine learning and deep learning models on image, text, time series, and tabular data.


# First install package from terminal:
# pip install -U pip
# pip install -U setuptools wheel
# pip install autogluon  # autogluon==0.6.0

from autogluon.tabular import TabularDataset, TabularPredictor
train_data = TabularDataset('')
test_data = TabularDataset('')
predictor = TabularPredictor(label='class').fit(train_data, time_limit=120)  # Fit models for 120s
leaderboard = predictor.leaderboard(test_data)


See the AutoGluon Website for documentation and instructions on: - Installing AutoGluon - Learning with tabular data - Tips to maximize accuracy (if benchmarking, make sure to run fit() with argument presets='best_quality').

Refer to the AutoGluon Roadmap for details on upcoming features and releases.



Latest Release Python Versions Twitter

DGL is an easy-to-use, high performance and scalable Python package for deep learning on graphs. DGL is framework agnostic, meaning if a deep graph model is a component of an end-to-end application, the rest of the logics can be implemented in any major frameworks, such as PyTorch, Apache MXNet or TensorFlow.

A GPU-ready graph library

DGL provides a powerful graph object that can reside on either CPU or GPU. It bundles structural data as well as features for better control. We provide a variety of functions for computing with graph objects including efficient and customizable message passing primitives for Graph Neural Networks.

A versatile tool for GNN researchers and practitioners

The field of graph deep learning is still rapidly evolving and many research ideas emerge by standing on the shoulders of giants. To ease the process, DGl-Go is a command-line interface to get started with training, using and studying state-of-the-art GNNs. DGL collects a rich set of example implementations of popular GNN models of a wide range of topics. Researchers can search for related models to innovate new ideas from or use them as baselines for experiments. Moreover, DGL provides many state-of-the-art GNN layers and modules for users to build new model architectures. DGL is one of the preferred platforms for many standard graph deep learning benchmarks including OGB and GNNBenchmarks.

Easy to learn and use

DGL provides plenty of learning materials for all kinds of users from ML researchers to domain experts. The Blitz Introduction to DGL is a 120-minute tour of the basics of graph machine learning. The User Guide explains in more details the concepts of graphs as well as the training methodology. All of them include code snippets in DGL that are runnable and ready to be plugged into one’s own pipeline.

Scalable and efficient

It is convenient to train models using DGL on large-scale graphs across multiple GPUs or multiple machines. DGL extensively optimizes the whole stack to reduce the overhead in communication, memory consumption and synchronization. As a result, DGL can easily scale to billion-sized graphs. Get started with the tutorials and user guide for distributed training. See the system performance note for the comparison with other tools.


GitHub release (latest SemVer) GitHub stars GitHub forks GitHub contributors GitHub issues good first issue GitHub pull requests by-label GitHub license Twitter Follow

Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to mix symbolic and imperative programming to maximize efficiency and productivity. At its core, MXNet contains a dynamic dependency scheduler that automatically parallelizes both symbolic and imperative operations on the fly. A graph optimization layer on top of that makes symbolic execution fast and memory efficient. MXNet is portable and lightweight, scalable to many GPUs and machines.

Apache MXNet is more than a deep learning project. It is a community on a mission of democratizing AI. It is a collection of blue prints and guidelines for building deep learning systems, and interesting insights of DL systems for hackers.

Licensed under an Apache-2.0 license.

Parameter Server

Build Status GitHub license

A light and efficient implementation of the parameter server framework. It provides clean yet powerful APIs. For example, a worker node can communicate with the server nodes by - Push(keys, values): push a list of (key, value) pairs to the server nodes - Pull(keys): pull the values from servers for a list of keys - Wait: wait untill a push or pull finished. - Flexible and high-performance communication: zero-copy push/pull, supporting dynamic length values, user-defined filters for communication compression - Server-side programming: supporting user-defined handles on server nodes

A simple example:

  std::vector<uint64_t> key = {1, 3, 5};
  std::vector<float> val = {1, 1, 1};
  std::vector<float> recv_val;
  ps::KVWorker<float> w;
  w.Wait(w.Push(key, val));
  w.Wait(w.Pull(key, &recv_val));

Research papers


This repo has code for fast sampling for Latent Variable Models. It has implementations of the following for large scale deployment:

  1. CoverTree - Fast nearest neighbour search
  2. KMeans - Simple, fast, and distributed clustering with option of various initialization
  3. GMM - Fast and distributed inference for Gaussian Mixture Models with diagonal covariance matrices
  4. LDA - Fast and distributed inference for Latent Dirichlet Allocation
  5. GLDA - Fast and distributed inference for Gaussian LDA with diagonal covariance matrices
  6. HDP - Fast inference for Hierarchical Dirichlet Process

Credits and Acknowledgments

We use a distributed and parallel extension and implementation of Cover Tree data structure for nearest neighbour search. The data structure was originally presented in and improved in:

  • Alina Beygelzimer, Sham Kakade, and John Langford. “Cover trees for nearest neighbor.” Proceedings of the 23rd international conference on Machine learning. ACM, 2006.
  • Mike Izbicki and Christian Shelton. “Faster cover trees.” Proceedings of the 32nd International Conference on Machine Learning (ICML-15). 2015.

We implement a modified inference for Gaussian LDA. The original model was presented in:

  • Rajarshi Das, Manzil Zaheer, Chris Dyer. “Gaussian LDA for Topic Models with Word Embeddings.” Proceedings of ACL (pp. 795-804) 2015.

We implement a modified inference for Hierarchical Dirichlet Process. The original model and inference methods were presented in:

  • Y. Teh, M. Jordan, M. Beal, and D. Blei. Hierarchical dirichlet processes. Journal of the American Statistical Association, 101(576):1566{1581, 2006.
  • C. Chen, L. Du, and W.L. Buntine. Sampling table configurations for the hierarchical poisson-dirichlet process. In European Conference on Machine Learning, pages 296-311. Springer, 2011.

Yahoo LDA

This was the first distributed topic model solver that could scale at internet scale (or whatver this was in 2010). It uses a collapsed Gibbs sampler and a prototype of the Parameter Server distributed optimization and synchronization pattern. For a description of the basic algorithm see this VLDB 2010 paper.

Bundle Methods Solver

BMRM is an open source, modular and scalable convex solver for many machine learning problems cast in the form of regularized risk minimization problem. It is modular because the (problem-specific) loss function module is decoupled from the (regularization-specific) optimization module (e.g. quadratic programming or linear programming solvers), thus shorten the time to implement/prototype solutions to new problems. Besides, the decoupling leads to easier parallelization of the loss function computation. At the moment it solves the following problems:

  • Binary classification
  • Soft-margin, Squared Soft-margin, Huber-hinge, Logistic regression, Exponential, ROC Score, F-beta Score
  • Univariate regression
  • \(\epsilon\)-insensitive, Huber robust, Least Mean Squares, Least Absolute Deviation
  • Novelty detection (1-class SVM)
  • Quantile regression
  • Poisson regression
  • Ranking
  • NDCG (normalized discounded cummulative gain), Ordinal regression

Many thanks to Jan Funke for forking and thus rescuing Choon-Hui Teo’s code.

Collaborative Filtering

Cofirank is our collaborative filtering solver. It is built on top of the bundle method solver. The goal is to predict preferences of users based on past ratings by them and other users. We build upon the approach of Maximum Margin Matrix Factorization, yet extend it in several ways:

  • Cofi can make use of state of the art optimization technology, making it feasible to run on the largest data sets available. Cofi can be run on a single machine with modderate memory requirements (2GB) to train on the Netflix dataset with its 100.000.000 entries.
  • Cofi is able to do structured predition, e.g. by predicting the relative order with which you like movies instead of the absolute rating you would give them. This allows for models that are better suited to predict what you like than dislike. This property is important for recommender systems.
  • Cofi can be parallelized easily to take advantage of multi core machines or clusters of workstations.
  • In addition, Cofi has some desirable properties which stem from MMMF: it does not need explicit features of the items or users. However, it can use them whenever available.


This is an old Matlab toolbox that I wrote mainly for my own purpose. Some early versions have been released and the code is now completely unmaintained. At the time it was our main workhorse for kernel methods.

It may not even run on a recent version of MATLAB any more. I abandoned the work after the Mathworks increased the price for a license by a factor of 10 because they decided to classify NICTA as an industrial research lab. Please do not ask me for help with bugfixes or instructions. The code is provided as is and it is over 20 years old! It is released under the GNU Public License.