Alex Smola - Adventures in Data Land

Categories

All (33)

Amazon (1)

Bloom filter (1)

CMU (1)

LDA (1)

MLSS (3)

Pittsburgh (1)

Purdue (1)

Weisfeiler Leman (1)

acceleration (1)

big learning (1)

book (1)

caching (1)

classification (1)

collaborative filtering (2)

covariate shift (1)

d2l (1)

data privacy (1)

distance (1)

distributed learning (2)

distributed synchronization (1)

dot product (1)

feistel network (1)

floating point (1)

graphical models (3)

graphs (1)

hashing (6)

kernel (2)

kernels (1)

latency (1)

linear algebra (1)

linear function (1)

model selection (1)

optimization (4)

parameter server (2)

random features (1)

random numbers (1)

regularization (1)

sampler (1)

sampling (2)

search (1)

semiring (1)

social networks (1)

softmax (1)

sparsity (3)

stream (1)

trick (2)

tutorial (2)

why (1)

workshop (1)

Dive into Deep Learning

d2l

book

I’m happy to announce our new book project - Dive into Deep Learning. It’s still in beta stage, i.e. we’re still working…

Leaving CMU

CMU

Amazon

Dear Friends,
As some of you may have already heard, I’m leaving CMU to join Amazon, effective July 1, 2016. There I will be in charge of Amazon’s Cloud Machine Learning Platform with…

MLSS

Pittsburgh

Zico Kolter and I proudly announce the 2014 Machine Learning Summer School in Pittsburgh. It will be held at Carnegie Mellon University in July 7-18, 2014. Our focus is on…

Distributing Data in a Parameterserver

parameter server

One of the key features of a parameter server is that it, well, serves parameters. In particular, it serves more parameters than a single machine can typically hold and…

100 Terabytes, 5 Billion Documents, 10 Billion Parameters, 1 Billion Inserts/s

parameter server

distributed learning

We’ve been busy building the next generation of a Parameter Server and it’s finally ready. Check out the OSDI 2014 paper by Li et al.; It’s quite different from our previous designs, the main improvements being fault tolerance and self repair, a much improved network protocol…

Beware the bandwidth gap - speeding up optimization

optimization

caching

Disks are slow and RAM is fast. Everyone knows that. But many optimization algorithms don’t take advantage of this. More to the point, disks currently stream at about…

The Weisfeiler-Leman algorithm and estimation on graphs

Weisfeiler Leman

graphs

kernels

The Weisfeiler-Leman algorithm and estimation on graphs Imagine you have two graphs \(G\) and \(G′\) and you’d like to check how…

In defense of keeping data private

data privacy

social networks

In defense of keeping data private This is going to be contentious. And it somewhat goes against a lot of things that researchers hold holy. And it goes…

MLSS Purdue

MLSS

Purdue

Random numbers in constant storage

random numbers

hashing

Many algorithms require random number generators to work. For instance, locality sensitive hashing requires one to compute the random projection matrix \(P\) in order to…

tutorial

graphical models

The slides for the NIPS…

The Neal Kernel and Random Kitchen Sinks

kernel

random features

So you read a book on Reproducing Kernel Hilbert Spaces and you’d like to try out this kernel thing. But you’ve got a lot of data and most algorithms will give you an expansion that requires a number of kernel functions linear in…

Big Learning: Algorithms, Systems, and Tools for Learning at Scale

workshop

big learning

We’re organizing a workshop at NIPS 2011. Submission are solicited for a two day workshop December 16-17 in Sierra Nevada, Spain.

Introduction to Graphical Models

graphical models

MLSS

Here are the slides [Keynote, PDF] for a basic course on Graphical Models for the Internet that I’m giving at MLSS 2011 in Purdue that Vishy…

Distributed synchronization with the distributed star

distributed synchronization

hashing

Here’s a simple synchronization paradigm between many computers that scales with the number of machines involved and which essentially keeps cost at \(O(1)\) per machine. For lack of a better name I’m going to call it the distributed star since this is what the communication looks like. It’s quite similar to how memcached stores…

Speeding up Latent Dirichlet Allocation

LDA

sampler

The code to our LDA implementation on Hadoop is released on Github under the Mozilla Public License. It’s seriously fast and scales very well to 1000 machines or more (don’t worry, it runs on a single machine, too). We believe that at…

Bloom Filters

Bloom filter

hashing

Bloom filters are one of the really ingenious and simple building blocks for randomized data structures. A great summary is the paper by Broder and Mitzenmacher, 2005. The figure above is from their paper. In this post I will briefly review its key ideas since it forms the basis of the Count-Min…

Real simple covariate shift correction

covariate shift

classification

Imagine you want to design some algorithm to detect cancer. You get data of healthy and sick people; you train your algorithm; it works fine, giving you high accuracy and…

graphical models

tutorial

Here are a few tutorial slides I prepared with Amr Ahmed for WWW 2011 in Hyderabad next week. They describe in fairly basic (and in the end rather advanced) terms how one might use…

Memory Latency, Hashing, Optimal Golomb Rulers and Feistel Networks

hashing

feistel network

latency

In many problems involving hashing we want to look up a range of elements from a vector where the elements are indicated by a hash function \(h\). For…

Collaborative Filtering considered harmful

collaborative filtering

search

Much excellent work has been published on…

Why?

why

Some readers might wonder why I’m writing this blog. Here’s an (incomplete) list:

Hashing for Collaborative Filtering

hashing

collaborative filtering

This is a follow-up on the hashing for linear functions post. It’s based on the HashCoFi paper that Markus Weimer, Alexandros Karatzoglou and I wrote for AISTATS’10. It deals with the issue of running out of memory when you want to use collaborative filtering for…

Priority Sampling

sampling

sparsity

Tamas Sarlos pointed out a much smarter strategy on how to obtain a sparse representation of a (possibly dense) vector: Priority Sampling by Duffield, Lund and Thorup, 2006. The idea is quite ingenious and (surprisingly so) essentially…

Random elements from a stream

sampling

stream

This is a classic trick when dealing with data streams. It shows how to draw a random element from a sequence of instances without knowing beforehand how long the sequence…

Sparsifying a vector/matrix

sparsity

Sometimes we want to compress vectors to reduce memory footprint or to minimize computational cost. For instance in deep learning we can accelerate operations by keeping only 2 out of 4 of all…

Log-probabilities, semirings and floating point numbers

floating point

softmax

semiring

Here’s a trick/bug that is a) really well known in the research community, b) lots of beginners…

Parallel Stochastic Gradient Descent

distributed learning

optimization

Here’s the problem: you’ve optimized your stochastic gradient descent library but the code is still not fast enough. When streaming data off a disk/network you cannot exceed…

Hashing for Linear Functions

linear function

hashing

This is the first of a few posts on hashing. It’s an incredibly powerful technique when working with discrete objects and sequences. And…

In Praise of the Second Binomial Formula

distance

dot product

linear algebra

Here’s a simple trick you can use to compute pairs of distances: use the second binomial formula and a linear algebra library. These problems occur in RBF kernel…

Lazy updates for generic regularization in SGD

sparsity

optimization

Yesterday I wrote about how to do fast stochastic gradient descent updates for quadratic…

Easy Kernel Width Selection

kernel

trick

model selection

This is an idea that was originally put forward by Bernhard Schölkopf in his thesis: Assume you have an…

Fast quadratic regularization for online learning

optimization

acceleration

regularization

trick

After a few discussions within Yahoo I’ve decided to post a bunch of tricks here. A lot of these are well known. Some others might be new. They’re small…