NIPS 2002 Workshop

Unreal Data: Principles of Modeling Nonvectorial Data

Organizers: Zoubin Ghahramani, Gunnar Ratsch, Alex Smola

Location: Whistler, BC, December 13, 2002

Abstract

A large amount of research in machine learning is concerned with classification and regression for real-valued data which can easily be embedded into a Euclidean vector space. This is in stark contrast with many real world problems, where the data is often a highly structured combination of features (e.g., natural language and speech processing), a sequence of symbols (e.g., bioinformatics), a mixture of different modalities, may have missing variables, etc. The items in non-vectorial data sets can be one dimensional structures (e.g. sequences), two dimensional (e.g. images), three dimensional (e.g. molecular descriptions), trees (e.g. xml documents), or other hybrid and not-so-easily classified data structures.

To address the problem of learning from non-vectorial data, various methods have been proposed, such as embedding the structures in Hilbert spaces (e.g., via Kernels), the extraction and selection of features, proximity based approaches, parameter constraints in Graphical Models, Inductive Logic Programming, Decision Trees, or clever hand-crafted models.

Aims of this workshop: The goal of this workshop is twofold. Firstly, we hope to make the machine learning community aware of the problems arising from domains where non-vectorspace data abounds and to uncover the pitfalls of mapping such data into vector spaces. Secondly, we will try to find a more uniform structure governing methods for dealing with non-vectorial data and to understand what, if any, are the principles underlying the modeling of non-vectorial data.

Schedule

Morning Session

07:30–08:15 Thore Graepel – “Getting Real with Unreal Data: Lessons Learned and the Way Ahead”
08:15–08:55 Fernando Pereira – “Undirected graphical models for sequence analysis”
08:55–09:10 Coffee break
09:10–09:50 Koji Tsuda – “Marginalized Kernels for Biological Sequences”
09:50–10:30 Mehryar Mohri – “Algorithmic Challenges for Speech Mining”

Afternoon Session

16:00–16:40 Zoubin Ghahramani – “Graphical Models for Non-vectorial Data”
16:40–17:20 Alan Yuille – “The Structure in Computer Vision Problems”
17:20–17:35 Coffee break

Contributed Talks

17:35–17:55 Erik Miller – “Practical Non-parametric Density Estimation on a Transformation Group for Vision”
17:55–18:15 Thomas Gartner – “Exponential and Geometric Kernels for Graphs”
18:15–18:35 S.V.N. Vishwanathan – “Kernels on Automata”
18:35–19:00 Paolo Frasconi – “Comparing convolutional kernels and recursive networks”

Resources

Abstracts & Slides