Introduction to Machine Learning - 10-701/15-781


The project nets you one third of the course credits. On top of that, it is a good opportunity to find out whether you like machine learning. Since this is research, there's no guarantee, that the project will actually succeed. What matters most is that you try solving the problem using a scientific approach.

  • Maximum team size is 4, and a typical team should have 3 members. Two are OK but it is in your interest to join a larger team since you will be able to accomplish more.

  • The project is comparable to an academic paper that you might send to a journal. That is, it isn't sufficient to just run a few MATLAB scripts and plot a bunch of figures.

  • In terms of topics, I am happy with both implementations which build systems processing considerable amounts of data, novel algorithms or proofs and new theory. It is preferable if your designs pass the scalability test. In other wods, it's perfectly OK to prove theorems if they relate to a new algorithm which is scalable over many machines. Obviously if you have a mix of all three things it's best. A perfect score would be work of the level that can get into a tier 1 or 2 conference.

Use the office hours to get feedback and advice on your project. It is in your interest to do this. We will give you suggestions on how to solve problems, feedback on whehter a project will succeed, and help you with ideas.


For the project proposal each team needs to prepare a one-page report stating the problem, who will participate, and what the expected outcomes will be. It is due on February 11, 2013.

Midterm evaluation

For the midterm project evaluation each team needs to prepare a one-page report stating the problem and what has been achieved so far. It is due on March 6, 2013, i.e. two days after the midterm.


A written report using the ACM Style Guide. At least 4 pages of two-column documentation and no more than 10 pages.

  • The report should include pointers to code, data, etc. such that the work is sufficiently reproducible.

  • You need an abstract, introduction, a discussion of related work, a description of the main idea, a description of the data, experiments, and a summary.

  • Symbols must be defined before being used (or, at least, within the same paragraph if it is inevitable). The human mind works like a compiler in this case. Compiler errors are bad for your grade. Just because it looks pretty doesn't mean it is pretty.

  • You need to be precise in the main body of the paper. The introduction can be used to provide the intuition.


  • A poster (between A1 and A0 size) for the poster presentation. See e.g. here for a style file. We will make a more concise style file available in due course.

  • If your project is very good, you will get the opportunity to present it in class. Only the six best posters will get this chance. There will be spotlights for other posters. Note that just like in a conference, there is correlation between your score and the amount of exposure but not a direct mapping.

  • The Stanford ML class with Andrew Ng did a great job. Let's match this!

Heilmeier's criteria

You should be able to address Heilmeier's criteria, as adapted for the purpose of this class. This type of reasoning will help you with choosing your own research agenda, writing grants, convincing colleagues, securing VC funding, and writing papers. So it's good practice.

  • What are you trying to do? Articulate your objectives using absolutely no jargon.

  • How is it done today, and what are the limits of current practice?

  • What's new in your approach and why do you think it will be successful?

  • Who cares? If you're successful, what difference will it make?

  • What are the risks and the payoffs?

  • How long will it take and what have you achieved so far?

  • How will you determine success?