PSB 2003 Tutorial

PSB 2003

Beyond Homology-based Methods for the Elucidation of Function in Data Mining

Liping Wei, John Shon, and John Park, Nexus Genomics, Inc.

Description

With the exponential growth of the number of available DNA, protein, and genomic sequences, the computational elucidation of the function of genes and proteins has been and will remain an important area of focus in bioinformatics. The majority of today's computational methods identify function by homology, determined by similarity of the sequence or 3D protein structure between the new gene and a gene whose function has been determined. However, about 40-60% of new genes have no significant homology with any genes with known function, thus their function still remains unknown.

To push the boundary of function elucidation, in recent years, researchers have developed a number of new methods that identify gene function based on nonhomology measures. In this tutorial, we will first review briefly the most widely used traditional homology-based methods. We will then review in detail a number of new nonhomology methods for the elucidation of function, covering both the underlying algorithms and practical uses, and discuss the advantages and caveats. The nonhomology methods take advantages of new types of data such as whole genomes, large transcript libraries, and so on. We will also discuss ways to incorporate both homology and nonhomology methods in a data mining pipeline.

Appropriate audience of this tutorial are:

Academic and industry biologists who use bioinformatics tools in data mining in their research. This tutorial will broaden the knowledge base for these biologists and push the boundary of function elucidation of their newly discovered genes and proteins.
Bioinformatics scientists who develop algorithms and tools. This tutorial will bring them up to date on the latest nonhomology algorithms.
Bioinformaticians who maintain analysis pipelines for their group. This tutorial will bring more options to their attention to strength their tool kits and pipelines.

This tutorial requires the audience to have the basic biology knowledge about genes, genomes, and functions, and an understanding of the problem of computational elucidation of gene function. Sophisticated background in computing and statistics is not required.

Biographical sketches

Liping Wei holds a Ph.D. degree in biomedical informatics from Stanford University, working with Dr. Russ Altman. She is currently a principal of the bioinformatics consulting group at Nexus Genomics in Mountain View, California. She has had extensive research experience in target discovery at Stanford University and Exelixis, Inc., and published in the areas of automated annotation of genes, elucidation of function from protein structures, systematic analysis of protein structure-function relationship, and so on. She has given numerous well-acclaimed bioinformatics lectures including ones at Stanford University, S-Star.org, Beijing Genomics Institute, and many biotech companies.

John Shon holds an M.D. degree and an M.S. degree in biomedical informatics, both from Stanford University. He is a principal of the bioinformatics consulting group at Nexus Genomics. John is a board-certified physician, and a Clinical Instructor in Internal Medicine at the Stanford Medical School. John has published in the areas of oncogenes and microbiology, as well as web-based biomedical information systems.

John Park has extensive research experience in the past two decades in both academia and industry including Stanford University and Arris Pharmaceuticals. He has published in the areas of knowledge modeling, ontology mapping, database integration, and computer-aided drug design. John holds a B.A. in biochemistry from Harvard College. He is currently a principal of the bioinformatics consulting group at Nexus Genomics.