Modern information retrieval systems facilitate information access at unprecedented scale and level of sophistication. However, in many cases the underlying representation of text remains quite simple, often limited to using a weighted bag of words. Over the years, several approaches to automatic feature generation have been proposed (such as Latent Semantic Indexing, Hashing, or Latent Dirichlet Allocation), yet their application in large scale systems still remains the exception rather than the rule. On the other hand, numerous studies in NLP and IR resort to manually crafting features, which is a laborious and often computationally expensive process. Such studies often focus on one specific problem, and consequently many features they define are task- or domain-dependent. Consequently little knowledge transfer is possible to other problem domains. This limits our understanding of how to reliably construct informative features for new tasks.
Physical and economic limitations have forced computer architecture towards parallelism and away from exponential frequency scaling. Meanwhile, increased access to ubiquitous sensing and the web has resulted in an explosion in the size of machine learning tasks. In order to benefit from current and future trends in processor technology we must discover, understand, and exploit the available parallelism in machine learning. This workshop will achieve four key goals:
When dealing with distributions it is in general infeasible to estimate them explicitly in high dimensional settings, since the associated learning rates can be arbitrarily slow. On the other hand, a great variety of applications in machine learning and computer science require distribution estimation andor comparison. Examples include testing for homogeneity (the “two-sample problem”), independence, and conditional independence, where the last two can be used to infer causality; data set squashing data sketching / data anonymisation; domain adaptation (the transfer of knowledge learned on one domain to solving problems on another, related domain) and the related problem of covariate shift; message passing in graphical models (EP and related algorithms); compressed sensing; and links between divergence measures and loss functions.
Graphical models provide a natural method to model variables with structured conditional independence properties. They allow for understandable description of models, making them a popular tool in practice. Kernel methods excel at modeling data which need not be structured at all, by using mappings into high-dimensional spaces (also popularly called the kernel trick). The popularity of kernel methods is primarily due to their strong theoretical foundations and the relatively simple convex optimization problems.
Recent progress towards a unification of the two areas has seen work on Maximum Margin Markov Networks, structured output spaces, and kernelized Conditional Random Fields. Some work has also been done on using fundamental properties of the exponential family of probability distributions to establish links.
The aim of this workshop is to bring together researchers from both the communities together in order to facilitate interactions. More specifically, the issues we want to address include (but are not limited to), the fundamental theory linking these fields. We want to investigate connections using exponential families, conditional random fields, Markov models etc. We also wish to explore the applications of the kernel trick to graphical models and study the optimization problems which arise out of such a marriage. Uniform convergence type results for theoretically bounding the performance of such models will also be discussed.
A large amount of research in machine learning is concerned with classification and regression for real-valued data which can easily be embedded into a Euclidean vector space. This is in stark contrast with many real world problems, where the data is often a highly structured combination of features, a sequence of symbols, a mixture of different modalities, may have missing variables, etc. To address the problem of learning from non-vectorial data, various methods have been proposed, such as embedding the structures in some metric spaces, the extraction and selection of features, proximity based approaches, parameter constraints in Graphical Models, Inductive Logic Programming, Decision Trees, etc. The goal of this workshop is twofold. Firstly, we hope to make the machine learning community aware of the problems arising from domains where non-vectorspace data abounds and to uncover the pitfalls of mapping such data into vector spaces. Secondly, we will try to find a more uniform structure governing methods for dealing with non-vectorial data and to understand what, if any, are the principles underlying the modeling of non-vectorial data.
This workshop aims to bring together people working with Gaussian Process (GP) and Support Vector Machine (SVM) predictors for regression and classification problems. We will open with tutorial-like introductions to the basics so that researchers new to the area can gain an impression of the applicability of the approaches, and will follow with contributed presentations. The final part of the workshop will be an open discussion session. We would bring laptops to provide some software demos, and would encourage others to do the same.
We are hosting a one day informal workshop on Sunday 28th March at Nordkirchen Castle, Germany, on the Sunday before the EuroCOLT’99 conference; particular interest of the organisers is the analysis of Kernels and Regularization and this will be one of the themes of the workshop.
The aim is to provide a meeting venue for those who are attending both the Dagstuhl meeting on unsupervised learning, ending on the 26th, and the EuroCOLT conference, starting on the 29th. Those not attending the Dagstuhl meeting are of course very welcome to participate, too. If you wish to attend, consider arriving on the Saturday evening when there will be a meeting to arrange the format of the day.
Many pattern classifiers are represented as thresholded real-valued functions, eg: sigmoid neural networks, support vector machines, voting classifiers, and Bayesian schemes. There is currently a great deal of interest in algorithms that produce classifiers of this kind with large margins, where the margin is the amount by which the classifier's prediction is to the correct side of threshold. Recent theoretical and experimental results show that many learning algorithms (such as back-propagation, SVM methods, AdaBoost, and bagging) frequently produce classifiers with large margins, and that this leads to better generalization performance. Hence there is sufficient reason to believe that Large Margin Classifiers will become a core method of the standard machine learning toolbox.
The Support Vector (SV) learning algorithm (Boser, Guyon, Vapnik, 1992; Cortes, Vapnik, 1995; Vapnik, 1995) provides a general method for solving Pattern Recognition, Regression Estimation and Operator Inversion problems. The method is based on results in the theory of learning with finite sample sizes. The last few years have witnessed an increasing interest in SV machines, due largely to excellent results in pattern recognition, regression estimation and time series prediction experiments. The purpose of this workshop is (1) to provide an overview of recent developments in SV machines, ranging from theoretical results to applications, (2) to explore connections with other methods, and (3) to identify weaknesses, strengths and directions for future research for SVMs. We invite contributions on SV machines and related approaches, looking for empirical support wherever possible