Angela Gasdaska1,†, Derek Friend2,†, Rachel Chen3, Jason Westra4, Matthew Zawistowski5, William Lindsey4, Nathan Tintle4,*
1Department of Mathematics and Computer Science and Department of Quantitative Theory and Methods, Emory University
2Department of Geography, University of Nevada
3Department of Statistics, North Carolina State University
4Department of Math, Computer Science, and Statistics, Dordt College
5Department of Biostatistics, University of Michigan
†Authors contributed equally to this work
*Corresponding author
Email: aegasdaska@gmail.com, derekfriend@outlook.com, rschen@ncsu.edu, westrajason@hotmail.com, mattz@umich.edu, William.Lindsey@dordt.edu, Nathan.Tintle@dordt.edu
Pacific Symposium on Biocomputing 24:391-402(2019)
© 2019 World Scientific
Open Access chapter published by World Scientific Publishing Company and distributed under the terms of the Creative Commons Attribution (CC BY) 4.0 License.
As genetic sequencing becomes less expensive and data sets linking genetic data and medical records (e.g., Biobanks) become larger and more common, issues of data privacy and computational challenges become more necessary to address in order to realize the benefits of these datasets. One possibility for alleviating these issues is through the use of already-computed summary statistics (e.g., slopes and standard errors from a regression model of a phenotype on a genotype). If groups share summary statistics from their analyses of biobanks, many of the privacy issues and computational challenges concerning the access of these data could be bypassed. In this paper we explore the possibility of using summary statistics from simple linear models of phenotype on genotype in order to make inferences about more complex phenotypes (those that are derived from two or more simple phenotypes). We provide exact formulas for the slope, intercept, and standard error of the slope for linear regressions when combining phenotypes. Derived equations are validated via simulation and tested on a real data set exploring the genetics of fatty acids.