The genomic data available to computational biologists represents the product of the complex processes of evolution. In particular, the forces of mutation, duplication, and selection have acted to sculpt modern protein sequence and structure in the context of changing functional requirements. Just as crystallographers are able to determine protein structures through an analysis of X-ray diffraction patterns, scientists are learning to read the evolutionary history of proteins in order to infer and explain both structures and functions. In the simplest case, this has taken the form of identifying homologous relationships, with the idea that related proteins are likely to share structure and may share functions and mechanisms. There have also been effective uses of multiple related sequences to identify protein structures. Such approaches have often been "black-box" techniques that include the patterns of conservation and variation at different locations but ignore the underlying evolutionary dynamics and the exact nature of the evolutionary relationships. More recent approaches have attempted to model the evolutionary process explicitly so as to better understand the historical record available in growing genomic databases.
Conversely, the structure and function of a protein determines the types of selective pressures. As phylogenetic inference methods improve, we have come a better understanding of the effect of phylogenetic inference (and sequence alignment) on inferences of substitutional and coevolutionary patterns. Any realistic attempt to model the evolutionary process must take into account the types of selective pressure that occurs at the amino acid level, specifically the heterogeneity of selective pressures. In order to develop these models, we need to develop a better understanding of the relationship between structure and changes in sequence, by considering the evolutionary record as well as through simple computational models.
David Pollock
Thoeretical Biology and Biophysics, MS K-710
Los Alamos National Laboratory
Los Alamos, NM 87545.
dpollock@lanl.gov