Enabling High-Throughput Genotype-Phenotype Associations in the Epidemiologic Architecture for Genes Linked to Environment (EAGLE) Project as Part of the Population Architecture Using Genomics and Epidemiology (PAGE) Study


William S. Bush1, Jonathan Boston2, Sarah A. Pendergrass3, Logan Dumitrescu4, Robert Goodloe4, Kristin Brown-Gentry5, Sarah Wilson5, Bob McClellan Jr.5, Eric Torstenson6, Melissa A. Basford7, Kylee L. Spencer8, Marylyn D. Ritchie9, Dana C. Crawford10



1Department of Biomedical Informatics, Center for Human Genetics Research, Vanderbilt University;2Center for Human Genetics Research, Vanderbilt University;3Center for Systems Genomics, Department of Biochemistry and Molecular Biology, The Pennsylvania State University4Department of Molecular Physiology and Biophysics, Center for Human Genetics Research, Vanderbilt University;5Center for Human Genetics Research, Vanderbilt University;6Center for Human Genetics Research, Vanderbilt University7Office of Research, Office of Personalized Medicine, Vanderbilt University;8Biology and Environmental Science, Heidelberg University;9Center for System Genomics, Department of Biochemistry and Molecular Biology, , Pennsylvania State University10Department of Molecular Physiology and Biophysics, Center for Human Genetics Research, Vanderbilt University
Email: william.s.bush@vanderbilt.edu

Pacific Symposium on Biocomputing 18:373-384(2013)


Abstract

Genetic association studies have rapidly become a major tool for identifying the genetic basis of common human diseases. The advent of cost-effective genotyping coupled with large collections of samples linked to clinical outcomes and quantitative traits now make it possible to systematically characterize genotype- phenotype relationships in diverse populations and extensive datasets. To capitalize on these advancements, the Epidemiologic Architecture for Genes Linked to Environment (EAGLE) project, as part of the collaborative Population Architecture using Genomics and Epidemiology (PAGE) study, accesses two collections: the National Health and Nutrition Examination Surveys (NHANES) and BioVU, Vanderbilt University’s biorepository linked to de-identified electronic medical records. We describe herein the workflows for accessing and using the epidemiologic (NHANES) and clinical (BioVU) collections, where each workflow has been customized to reflect the content and data access limitations of each respective source. We also describe the process by which these data are generated, standardized, and shared for meta- analysis among the PAGE study sites. As a specific example of the use of BioVU, we describe the data mining efforts to define cases and controls for genetic association studies of common cancers in PAGE. Collectively, the efforts described here are a generalized outline for many of the successful approaches that can be used in the era of high-throughput genotype-phenotype associations for moving biomedical discovery forward to new frontiers of data generation and analysis.


[Full-Text PDF] [PSB Home Page]