Next-Generation Analysis of Cataracts: Determining Knowledge Driven Gene-Gene Interactions Using Biofilter, and Gene-Environment Interactions Using the PhenX Toolkit


Sarah A. Pendergrass1, Shefali S. Verma2, Emily R. Holzinger3, Carrie B. Moore4, John Wallace5, Scott M. Dudek6, Wayne Huggins7, Terrie Kitchner8, Carol Waudby9, Richard Berg10, Catherine A. McCarty11, Marylyn D. Ritchie12



1Center for Systems Genomics, Department of Biochemistry and Molecular Biology, The Pennsylvania State;2Center for Systems Genomics, Department of Biochemistry and Molecular Biology, The Pennsylvania State;3Center for Systems Genomics, Department of Biochemistry and Molecular Biology, The Pennsylvania State;4Center for Systems Genomics, Department of Biochemistry and Molecular Biology, The Pennsylvania State;5Center for Systems Genomics, Department of Biochemistry and Molecular Biology, The Pennsylvania State;6Center for Systems Genomics, Department of Biochemistry and Molecular Biology, The Pennsylvania State;7RTI International;8Marshfield Clinic;9Marshfield Clinic;10Marshfield Clinic;11Essential Rural Health;12Center for Systems Genomics, Department of Biochemistry and Molecular Biology, The Pennsylvania State
Email: sap29@psu.edu

Pacific Symposium on Biocomputing 18:147-158(2013)


Abstract

Investigating the association between biobank derived genomic data and the information of linked electronic health records (EHRs) is an emerging area of research for dissecting the architecture of complex human traits, where cases and controls for study are defined through the use of electronic phenotyping algorithms deployed in large EHR systems. For our study, 2580 cataract cases and 1367 controls were identified within the Marshfield Personalized Medicine Research Project (PMRP) Biobank and linked EHR, which is a member of the NHGRI-funded electronic Medical Records and Genomics (eMERGE) Network. Our goal was to explore potential gene-gene and gene-environment interactions within these data for 529,431 single nucleotide polymorphisms (SNPs) with minor allele frequency > 1%, in order to explore higher level associations with cataract risk beyond investigations of single SNP-phenotype associations. To build our SNP-SNP interaction models we utilized a prior-knowledge driven filtering method called Biofilter to minimize the multiple testing burden of exploring the vast array of interaction models possible from our extensive number of SNPs. Using the Biofilter, we developed 57,376 prior-knowledge directed SNP-SNP models to test for association with cataract status. We selected models that required 6 sources of external domain knowledge. We identified 5 statistically significant models with an interaction term with p-value < 0.05, as well as an overall model with p-value < 0.05 associated with cataract status. We also conducted gene-environment interaction analyses for all GWAS SNPs and a set of environmental factors from the PhenX Toolkit: smoking, UV exposure, and alcohol use; these environmental factors have been previously associated with the formation of cataracts. We found a total of 288 models that exhibit an interaction term with a p-value ? 1x10-4 associated with cataract status. Our results show these approaches enable advanced searches for epistasis and gene-environment interactions beyond GWAS, and that the EHR based approach provides an additional source of data for seeking these advanced explanatory models of the etiology of complex disease/outcome such as cataracts.


[Full-Text PDF] [PSB Home Page]