Alex Smola
Blog
Projects
Papers
Alumni
Teaching
CV
Adventures in Data Land
Categories
All
(33)
acceleration
(1)
Amazon
(1)
big learning
(1)
Bloom filter
(1)
book
(1)
caching
(1)
classification
(1)
CMU
(1)
collaborative filtering
(2)
covariate shift
(1)
d2l
(1)
data privacy
(1)
distance
(1)
distributed learning
(2)
distributed synchronization
(1)
dot product
(1)
feistel network
(1)
floating point
(1)
graphical models
(3)
graphs
(1)
hashing
(6)
kernel
(2)
kernels
(1)
latency
(1)
LDA
(1)
linear algebra
(1)
linear function
(1)
MLSS
(3)
model selection
(1)
optimization
(4)
parameter server
(2)
Pittsburgh
(1)
Purdue
(1)
random features
(1)
random numbers
(1)
regularization
(1)
sampler
(1)
sampling
(2)
search
(1)
semiring
(1)
social networks
(1)
softmax
(1)
sparsity
(3)
stream
(1)
trick
(2)
tutorial
(2)
Weisfeiler Leman
(1)
why
(1)
workshop
(1)
Dive into Deep Learning
d2l
book
I’m happy to announce our new book project - Dive into Deep Learning. It’s still in beta stage, i.e. we’re still working on it. That said, I think that it’s good enough to…
Jan 15, 2019
Leaving CMU
CMU
Amazon
Dear Friends,
As some of you may have already heard, I’m leaving CMU to join Amazon, effective July 1, 2016. There I will be in charge of Amazon’s Cloud Machine Learning…
Jun 1, 2016
MLSS
Pittsburgh
Zico Kolter and I proudly announce the 2014 Machine Learning Summer School in Pittsburgh. It will be held at Carnegie Mellon University in July 7-18, 2014. Our focus is on…
May 15, 2014
Distributing Data in a Parameterserver
parameter server
One of the key features of a parameter server is that it, well, serves parameters. In particular, it serves more parameters than a single machine can typically hold and…
Jan 15, 2013
100 Terabytes, 5 Billion Documents, 10 Billion Parameters, 1 Billion Inserts/s
parameter server
distributed learning
We’ve been busy building the next generation of a Parameter Server and it’s finally ready. Check out the OSDI 2014 paper by Li et al.; It’s quite different from our previous…
Dec 15, 2012
Beware the bandwidth gap - speeding up optimization
optimization
caching
Disks are slow and RAM is fast. Everyone knows that. But many optimization algorithms don’t take advantage of this. More to the point, disks currently stream at about…
Oct 15, 2012
The Weisfeiler-Leman algorithm and estimation on graphs
Weisfeiler Leman
graphs
kernels
The Weisfeiler-Leman algorithm and estimation on graphs Imagine you have two graphs
\(G\)
and
\(G′\)
and you’d like to check how similar they are. If all vertices have…
Sep 15, 2012
In defense of keeping data private
data privacy
social networks
In defense of keeping data private This is going to be contentious. And it somewhat goes against a lot of things that researchers hold holy. And it goes against my plan of…
Aug 15, 2012
MLSS Purdue
MLSS
Purdue
The videos from MLSS 2011 in Purdue are now available online. Unfortunately the 2011 MLSS website is gone and with it all the playlists and PDFs. The only thing left is a…
Jul 15, 2012
Random numbers in constant storage
random numbers
hashing
Many algorithms require random number generators to work. For instance, locality sensitive hashing requires one to compute the random projection matrix
\(P\)
in order to…
Jun 15, 2012
tutorial
graphical models
The slides for the NIPS 2011 tutorial on Graphical Models for the Internet are online. Lots of stuff on parallelization, applications to user modeling, content…
May 15, 2012
The Neal Kernel and Random Kitchen Sinks
kernel
random features
So you read a book on Reproducing Kernel Hilbert Spaces and you’d like to try out this kernel thing. But you’ve got a lot of data and most algorithms will give you an…
Apr 15, 2012
Big Learning: Algorithms, Systems, and Tools for Learning at Scale
workshop
big learning
We’re organizing a workshop at NIPS 2011. Submission are solicited for a two day workshop December 16-17 in Sierra Nevada, Spain.
Aug 15, 2011
Introduction to Graphical Models
graphical models
MLSS
Here are the slides [Keynote, PDF] for a basic course on Graphical Models for the Internet that I’m giving at MLSS 2011 in Purdue that Vishy Vishwanathan is organizing. The…
Aug 10, 2011
Distributed synchronization with the distributed star
distributed synchronization
hashing
Here’s a simple synchronization paradigm between many computers that scales with the number of machines involved and which essentially keeps cost at
\(O(1)\)
per machine.…
Aug 5, 2011
Speeding up Latent Dirichlet Allocation
LDA
sampler
The code to our LDA implementation on Hadoop is released on Github under the Mozilla Public License. It’s seriously fast and scales very well to 1000 machines or more (don’t…
Aug 1, 2011
Bloom Filters
Bloom filter
hashing
Bloom filters are one of the really ingenious and simple building blocks for randomized data structures. A great summary is the paper by Broder and Mitzenmacher, 2005. The…
Jul 30, 2011
Real simple covariate shift correction
covariate shift
classification
Imagine you want to design some algorithm to detect cancer. You get data of healthy and sick people; you train your algorithm; it works fine, giving you high accuracy and…
Jul 25, 2011
graphical models
tutorial
Here are a few tutorial slides I prepared with Amr Ahmed for WWW 2011 in Hyderabad next week. They describe in fairly basic (and in the end rather advanced) terms how one…
Jul 20, 2011
Memory Latency, Hashing, Optimal Golomb Rulers and Feistel Networks
hashing
feistel network
latency
In many problems involving hashing we want to look up a range of elements from a vector where the elements are indicated by a hash function
\(h\)
. For instance, we might…
Jul 15, 2011
Collaborative Filtering considered harmful
collaborative filtering
search
Much excellent work has been published on collaborative filtering, in particular in terms of recovering missing entries in a matrix. The Netflix contest has contributed a…
Jul 10, 2011
Why?
why
Some readers might wonder why I’m writing this blog. Here’s an (incomplete) list:
Jul 5, 2011
Hashing for Collaborative Filtering
hashing
collaborative filtering
This is a follow-up on the hashing for linear functions post. It’s based on the HashCoFi paper that Markus Weimer, Alexandros Karatzoglou and I wrote for AISTATS’10. It…
Jul 1, 2011
Priority Sampling
sampling
sparsity
Tamas Sarlos pointed out a much smarter strategy on how to obtain a sparse representation of a (possibly dense) vector: Priority Sampling by Duffield, Lund and Thorup, 2006.…
Jun 30, 2011
Random elements from a stream
sampling
stream
This is a classic trick when dealing with data streams. It shows how to draw a random element from a sequence of instances without knowing beforehand how long the sequence…
Jun 25, 2011
Sparsifying a vector/matrix
sparsity
Sometimes we want to compress vectors to reduce memory footprint or to minimize computational cost. For instance in deep learning we can accelerate operations by keeping only …
Jun 20, 2011
Log-probabilities, semirings and floating point numbers
floating point
softmax
semiring
Here’s a trick/bug that is a) really well known in the research community, b) lots of beginners get it wrong nonetheless, c) simple unit tests may not detect it and d) it…
May 1, 2011
Parallel Stochastic Gradient Descent
distributed learning
optimization
Here’s the problem: you’ve optimized your stochastic gradient descent library but the code is still not fast enough. When streaming data off a disk/network you cannot exceed…
Apr 1, 2011
Hashing for Linear Functions
linear function
hashing
This is the first of a few posts on hashing. It’s an incredibly powerful technique when working with discrete objects and sequences. And it’s also idiot-proof simple. I…
Mar 1, 2011
In Praise of the Second Binomial Formula
distance
dot product
linear algebra
Here’s a simple trick you can use to compute pairs of distances: use the second binomial formula and a linear algebra library. These problems occur in RBF kernel…
Jan 1, 2011
Lazy updates for generic regularization in SGD
sparsity
optimization
Yesterday I wrote about how to do fast stochastic gradient descent updates for quadratic regularization. However, there are lots more regularizers which one would want to…
Nov 12, 2010
Easy Kernel Width Selection
kernel
trick
model selection
This is an idea that was originally put forward by Bernhard Schölkopf in his thesis: Assume you have an RBF (radial basis function) kernel and you want to know how to scale…
Oct 12, 2010
Fast quadratic regularization for online learning
optimization
acceleration
regularization
trick
After a few discussions within Yahoo I’ve decided to post a bunch of tricks here. A lot of these are well known. Some others might be new. They’re small hacks, too small to…
Jan 15, 2010
No matching items