PSB 2002 Tutorial

PSB 2002

Support Vector and Kernel Methods for Bioinformatics

Nello Cristianini

Description

Pattern recognition problems in Bioinformatics are characterized by high dimensional and noisy data, by the need to insert prior knowledge in the system and to merge heterogeneous sources of data. The class of Kernel Methods has been designed to work in such conditions, and delivers state of the art performance in many bioinformatics applications.

Based on recent developements in learning theory, this new generation of learning systems introduces several innovations, from the theoretical and computational viewpoint. Its best known instance, Support Vector Machines, have been used to analyze DNA microarray data, sequence data, phylogenetic information, promoter region information, etc. Although sophisticated, the basic ideas can be introduced with some illustrative examples.

One of the main features of these systems is their modularity: they are formed by a general purpose learning module and by a problem specific kernel. For example a classification module can be coupled with a sequence matching kernel to obtain a protein sequences classifier, as demonstrated by the use of Support Vector Machines for protein homology detection. Learning modules have been developed to perform regression, novelty detection, ranking, clustering, etc. Each of these methods can use each of the kernels being developed for specific applications (e.g., sequence kernels).

In the first part of this tutorial we will introduce the basics of this method, gradually discussing all the main components and features of this approach, and giving simple examples and pointers to literature.

In the second part we will review the main applications of the method to bioinformatics problems: from remote protein homology detection to gene function classification, from the use of promoters information to cancer type classification.

Biographical Sketch:

Dr. Nello Cristianini is senior scientist at BIOwulf Genomics. He has been working in the field of Support Vector and Kernel Machines for many years, co-authoring the first text book on this topic and editing two journal special issues. He is action editor of the Journal of Machine Learning Research, and member of the steering board of kernel-machines.org.


Back to the main PSB page Updated: February 20, 2002