Phylogenetic motif detection by expectation-maximization on evolutionary mixtures

Moses AM, Chiang DY, Eisen MB

Graduate Group in Biophysics, Center for Integrative Genomics, University of California, Berkeley, USA. amoses@ocf.berkeley.edu

Pac Symp Biocomput. 2004;:324-35.


Abstract

The preferential conservation of transcription factor binding sites implies that non-coding sequence data from related species will prove a powerful asset to motif discovery. We present a unified probabilistic framework for motif discovery that incorporates evolutionary information. We treat aligned DNA sequence as a mixture of evolutionary models, for motif and background, and, following the example of the MEME program, provide an algorithm to estimate the parameters by Expectation-Maximization. We examine a variety of evolutionary models and show that our approach can take advantage of phylogenic information to avoid false positives and discover motifs upstream of groups of characterized target genes. We compare our method to traditional motif finding on only conserved regions. An implementation will be made available at http://rana.lbl.gov.


[Full-Text PDF] [PSB Home Page]