ATHENA: A Tool for Meta-Dimensional Analysis Applied to Genotypes and Gene Expression Data to Predict HDL Cholesterol Levels


Emily R Holzinger1, Scott M. Dudek2, Alex T. Frase3, Ronald M. Krauss4, Marisa W. Medina5, Marylyn D. Ritchie6



1Center for Human Genetics Research, Vanderbilt University;2Center for Systems Genomics, Pennsylvania State University3Center for Systems Genomics, Pennsylvania State University;4Children’s Hospital Oakland Research Institute;5Children’s Hospital Oakland Research Institute;6Center for Systems Genomics, Pennsylvania State University
Email: emily.r.holzinger@vanderbilt.edu

Pacific Symposium on Biocomputing 18:385-396(2013)


Abstract

Technology is driving the field of human genetics research with advances in techniques to generate high-throughput data that interrogate various levels of biological regulation. With this massive amount of data comes the important task of using powerful bioinformatics techniques to sift through the noise to find true signals that predict various human traits. A popular analytical method thus far has been the genome-wide association study (GWAS), which assesses the association of single nucleotide polymorphisms (SNPs) with the trait of interest. Unfortunately, GWAS has not been able to explain a substantial proportion of the estimated heritability for most complex traits. Due to the inherently complex nature of biology, this phenomenon could be a factor of the simplistic study design. A more powerful analysis may be a systems biology approach that integrates different types of data, or a meta-dimensional analysis. For this study we used the Analysis Tool for Heritable and Environmental Network Associations (ATHENA) to integrate high-throughput SNPs and gene expression variables (EVs) to predict high-density lipoprotein cholesterol (HDL-C) levels. We generated multivariable models that consisted of SNPs only, EVs only, and SNPs + EVs with testing r-squared values of 0.16, 0.11, and 0.18, respectively. Additionally, using just the SNPs and EVs from the best models, we generated a model with a testing r-squared of 0.32. A linear regression model with the same variables resulted in an adjusted r-squared of 0.23. With this systems biology approach, we were able to integrate different types of high-throughput data to generate meta-dimensional models that are predictive for the HDL- C in our data set. Additionally, our modeling method was able to capture more of the HDL-C variation than a linear regression model that included the same variables.


[Full-Text PDF] [PSB Home Page]