ANN-Spec: a method for discovering transcription factor binding sites with improved specificity

Workman CT, Stormo GD

Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark. workman@cbs.dtu.dk

Pac Symp Biocomput. 2000;:467-78.


Abstract

This work describes ANN-Spec, a machine learning algorithm and its application to discovering un-gapped patterns in DNA sequence. The approach makes use of an Artificial Neural Network and a Gibbs sampling method to define the Specificity of a DNA-binding protein. ANN-Spec searches for the parameters of a simple network (or weight matrix) that will maximize the specificity for binding sequences of a positive set compared to a background sequence set. Binding sites in the positive data set are found with the resulting weight matrix and these sites are then used to define a local multiple sequence alignment. Training complexity is O(lN) where l is the width of the pattern and N is the size of the positive training data. A quantitative comparison of ANN-Spec and a few related programs is presented. The comparison shows that ANN-Spec finds patterns of higher specificity when training with a background data set. The program and documentation are available from the authors for UNIX systems.


[Full-Text PDF] [PSB Home Page]