Genotype Phenotype Mapping in Rna Viruses - Disjunctive Normal Form Learning


Chuang Wu, Andrew S. Walsh, Roni Rosenfeld



School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA
Email: chuangw@cs.cmu.edu, awalsh@cs.cmu.edu, *Roni.Rosenfeld@cs.cmu.edu

Pacific Symposium On Biocomputing 16:62-73(2011)


Abstract

RNA virus phenotypic changes often result from multiple alternative molecular mechanisms, where each mechanism involves changes to a small number of key residues. Accordingly, we propose to learn genotype- phenotype functions, using Disjunctive Normal Form (DNF) as the assumed functional form. In this study we develop DNF learning algorithms that attempt to construct predictors as Boolean combinations of covariates. We demonstrate the learning algorithm's consistency and eciency on simulated sequences, and establish their biological relevance using a variety of real RNA virus datasets representing di erent viral phenotypes, including drug resistance, antigenicity, and pathogenicity. We compare our algorithms with previously published machine learning algorithms in terms of prediction quality: leave-one-out performance shows superior accuracy to other machine learning algorithms on the HIV drug resistance dataset and the UCIs promoter gene dataset. The algorithms are powerful in inferring the genotype-phenotype mapping from a moderate number of labeled sequences, as are typically produced in mutagenesis experiments. They can also greedily learn DNFs from large datasets. The Java implementation of our algorithms will be made publicly available.


[Full-Text PDF] [PSB Home Page]