Biomediator Data Integration and Inference for Functional Annotation of Anonymous Sequences

Cadag E, Louie B, Myler PJ, Tarczy-Hornoch P

Depts. of Medical Education and Biomedical Informatics, Pathobiology, Pediatrics, and Computer Science and Engineering, University of Washington, Seattle, WA USA
Seattle Biomedical Research Institute, Seattle, WA USA


Pac Symp Biocomput. 2007;:343-354.


Abstract

Scientists working on genomics projects are often faced with the difficult task of sifting through large amounts of biological information dispersed across various online data sources that are relevant to their area or organism of research. Gene annotation, the process of identifying the functional role of a possible gene, in particular has become increasingly more time-consuming and laborious to conduct as more genomes are sequenced and the number of candidate genes continues to increase at near-exponential pace; genes are left un-annotated, or worse, incorrectly annotated. Many groups have attempted to address the annotation backlog through automated annotation systems that are geared toward specific organisms, and which may thus not possess the necessary flexibility and scalability to annotate other genomes. In this paper, we present a method and framework which attempts to address problems inherent in manual and automatic annotation by coupling a data integration system, BioMediator, to an inference engine with the aim of elucidating functional annotations. The framework and heuristics developed are not specific to any particular genome. We validated the method with a set of randomly-selected annotated sequences from a variety of organisms. Preliminary results show that the hybrid data integration and inference approach generates functional annotations that are as good as or better than “gold standard” annotations ~80% of the time.


[Full-Text PDF] [PSB Home Page]