Empowering Multi-Cohort Gene Expression Analysis to Increase Reproducibility

Winston A. Haynes1,2,3, Francesco Vallania1, Charles Liu1,4, Erika Bongen1, Aurelie Tomczak1,3, Marta Andres-Terrè1, Shane Lofgren1, Andrew Tam1, Cole A. Deisseroth1,4, Matthew D. Li1, Timothy E.Sweeney1,3, and Purvesh Khatri1,3


1Stanford Institute for Immunity, Transplantation, and Infection, Stanford University
2Biomedical Informatics Training Program, Stanford University
3Stanford Center for Biomedical Informatics Research, Stanford University
4Stanford Institutes of Medicine Research Program, Stanford University
Email: pkhatri@stanford.edu

Pacific Symposium on Biocomputing 22:144-153(2017)

© 2017 World Scientific
Open Access chapter published by World Scientific Publishing Company and distributed under the terms of the Creative Commons Attribution (CC BY) 4.0 License.


Abstract

A major contributor to the scientific reproducibility crisis has been that the results from homogeneous, single-center studies do not generalize to heterogeneous, real world populations. Multi-cohort gene expression analysis has helped to increase reproducibility by aggregating data from diverse populations into a single analysis. To make the multi-cohort analysis process more feasible, we have assembled an analysis pipeline which implements rigorously studied meta-analysis best practices. We have compiled and made publicly available the results of our own multi-cohort gene expression analysis of 103 diseases, spanning 615 studies and 36,915 samples, through a novel and interactive web application. As a result, we have made both the process of and the results from multi-cohort gene expression analysis more approachable for non-technical users.


[Full-Text PDF] [PSB Home Page]