Genotype Phenotype Mapping in Rna Viruses - Disjunctive Normal Form LearningChuang Wu, Andrew S. Walsh, Roni Rosenfeld School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA Email: chuangw@cs.cmu.edu, awalsh@cs.cmu.edu, *Roni.Rosenfeld@cs.cmu.edu Pacific Symposium On Biocomputing 16:62-73(2011) |
|
AbstractRNA virus phenotypic changes often result from multiple alternative molecular mechanisms, where each mechanism involves changes to a small number of key residues. Accordingly, we propose to learn genotype- phenotype functions, using Disjunctive Normal Form (DNF) as the assumed functional form. In this study we develop DNF learning algorithms that attempt to construct predictors as Boolean combinations of covariates. We demonstrate the learning algorithm's consistency and eciency on simulated sequences, and establish their biological relevance using a variety of real RNA virus datasets representing dierent viral phenotypes, including drug resistance, antigenicity, and pathogenicity. We compare our algorithms with previously published machine learning algorithms in terms of prediction quality: leave-one-out performance shows superior accuracy to other machine learning algorithms on the HIV drug resistance dataset and the UCIs promoter gene dataset. The algorithms are powerful in inferring the genotype-phenotype mapping from a moderate number of labeled sequences, as are typically produced in mutagenesis experiments. They can also greedily learn DNFs from large datasets. The Java implementation of our algorithms will be made publicly available. | |
[Full-Text PDF] [PSB Home Page] |