PSB 2017 Workshop: Open Data for Discovery Science

Philip R.O. Payne, PhD¹; Kun Huang, PhD²; Nigam H. Shah, MBBS, PhD³; Jessica Tenenbaum, PhD⁴

¹Washington University in St. Louis, Institute for Informatics

²The Ohio State University, Department of Biomedical Informatics

³Stanford University, Center for Biomedical Informatics Research

⁴Duke University, Department of Biostatistics and Bioinformatics

Contact Us:

Introduction:

The modern healthcare and life sciences ecosystem is moving towards an increasingly open and data-centric approach to discovery science. This evolving paradigm is predicated on a complex set of information needs related to our collective ability to share, discover, reuse, integrate, and analyze open biological, clinical, and population level data resources of varying composition, granularity, and syntactic or semantic consistency. Such an evolution is further impacted by a concomitant growth in the size of data sets that can and should be employed for both hypothesis discovery and testing. When such open data can be accessed and employed for discovery purposes, a broad spectrum of high impact end-points is made possible. These span the spectrum from identification of de novo biomarker complexes that can inform precision medicine, to the repositioning or repurposing of extant agents for new and cost-effective therapies, to the assessment of population level influences on disease and wellness. Of note, these types of uses of open data can be either primary, wherein open data is the substantive basis for inquiry, or secondary, wherein open data is used to augment or enrich project-specific or proprietary data that is not open in and of itself.

Given these opportunities and the current state of knowledge concerning the use of open data across and between types and scales for the purposes of discovery science, this workshop will address:

1. The state-of-the-art in terms of tools and methods targeting the use of open data for discovery science, including but not limited to syntactic and semantic standards, platforms for data sharing and discovery, and computational workflow orchestration technologies that enable the creation of data analytics "pipelines"

2. Practical approaches for the automated and/or semi-automated harmonization, integration, analysis, and presentation of "data products" to enable hypothesis discovery or testing

3. Frameworks for the application of open data to support or enable hypothesis generation and testing in projects spanning the basic, translational, clinical, and population health research and practice domains (e.g., from molecules to populations).

Workshop Rationale:

PSB 2017 is specifically intended to provide a forum in which participants can present: "work in databases, algorithms, interfaces, natural language processing, modeling and other computational methods, as applied to biological problems, with emphasis on applications in data-rich areas of molecular biology." In addition "A major goal of PSB is to create productive interaction among the rather different research cultures of computer science and biology." As such, PSB 2017 provides an ideal venue for a vigorous and highly productive exchange of knowledge and ideas surrounding the current and future directions for the use of open data in order to support or enable discovery science, an area which by its nature involves:

1. The creation, verification and validation of tools and methods that can assist in the sharing, discovery, and analysis of open data in a primary or secondary manner, including the development of databases, algorithms, and modeling techniques therein

2. The conduct of discovery science in data-intensive experimental contexts that leverage such open data resources

3. The interaction of multidisciplinary computational, biology, clinical, and population health science teams to conduct research that serves to translate such discovery into patient-level or broader intervention strategies to improve human health and wellness.

Call for Abstracts

Workshop Organizers:

Philip R.O. Payne, PhD, FACMI

Dr. Payne is the Founding Director of the Institute for Informatics at Washington University in St. Louis. Previously, he served as Professor and Chair of the Department of Biomedical Informatics at The Ohio State University (OSU), where we was also the inaugural Director of the Translational Data Analytics @ OSU, a campus-wide program to create a singular presence in applied data analytics at one of the nation"s largest land-grant universities. Dr. Payne is an internationally recognized leader in the field of clinical research informatics (CRI) and translational bioinformatics (TBI). His research portfolio is actively supported by a combination of NCATS, NLM, and NCI grants and contracts, as well a variety of awards from both non-profit and philanthropic organizations. Dr. Payne received his Ph.D. with distinction in Biomedical Informatics from Columbia University, where his research focused on the use of knowledge engineering and human-computer interaction design principles in order to improve the efficiency of multi-site clinical and translational research programs. Dr. Payne is also the co-founder of Signet Accel LLC, a healthcare information technology start-up that delivers advanced data sharing and interoperability solutions to the healthcare delivery, translational research, and bio-pharmaceutical sectors. He is an elected fellow of the American College of Medical Informatics (ACMI), and serves as a consultant and advisor to a broad spectrum of academic, government, and private sector informatics and data science initiatives at the international-level.

Kun Huang, PhD

Dr. Kun Huang is Professor in Biomedical Informatics, Computer Science and Engineering, and Biostatistics at The Ohio State University (OSU). He is also the Division Director for Bioinformatics and Computational Biology in OSU Department of Biomedical Informatics as well as Associate Dean for Genomic Informatics in the OSU College of Medicine. His research program focuses on developing bioinformatics tools for systems biology and translational research. He has developed many methods for analyzing and integrating various types of high throughput biomedical data including gene expression microarray, next generation sequencing (NGS), qRT-PCR, proteomics and microscopic imaging experiments. These methods have been successfully applied to research projects on different diseases such as cancers, fibrosis, cardiovascular diseases, wound healing, and inflammatory bowel diseases and idiopathic pulmonary fibrosis (IPF). Recently he has been awarded multiple grants for developing integrative genomics software and algorithms for disease biomarker and therapeutic targets discovery. Dr. Kun Huang received his BS degree in Biological Sciences from Tsinghua University in 1996 and his MS degrees in Physiology, Electrical Engineering and Mathematics all from the University of Illinois at Urbana-Champaign (UIUC). He then received his PhD in Electrical and Computer Engineering from UIUC in 2004 with a focus on computer vision and machine learning.

Nigam Shah, MBBS, PhD, FACMI

Dr. Nigam Shah is associate professor of Medicine (Biomedical Informatics) at Stanford University, Assistant Director of the Center for Biomedical Informatics Research, and a core member of the Biomedical Informatics Graduate Program. Dr. Shah's research focuses on combining machine learning and prior knowledge in medical ontologies to enable use cases of the learning health system. Dr. Shah received the AMIA New Investigator Award for 2013 and the Stanford Biosciences Faculty Teaching Award for outstanding teaching in his graduate class on "Data driven medicine" (Biomedin 215). Dr. Shah was elected into the American College of Medical Informatics (ACMI) in 2015 and to the American Society for Clinical Investigation (ASCI) in 2016. He holds an MBBS from Baroda Medical College, India, a PhD from Penn State University and completed postdoctoral training at Stanford University. More at: https://med.stanford.edu/profiles/nigam-shah.

Jessica Tenenbaum, PhD

Dr. Tenenbaum is Assistant Professor in the Division of Translational Biomedical Informatics, Department of Biostatistics and Bioinformatics at Duke University, and Associate Director for Bioinformatics for the Duke Translational Medicine Institute. Her primary areas of research are: 1) Infrastructure and standards to enable research collaboration and integrative data analysis; 2) Informatics to enable precision medicine; and) Ethical, legal, and social issues that arise in translational research, direct to consumer genetic testing, and data sharing. At Duke, Dr. Tenenbaum oversaw development of the MURDOCK Integrated Data Repository (MIDR) for management and integration of clinical, demographic, omic, and bio-specimen data for the MURDOCK Study (www.murdock-study.com) and related ancillary studies. Nationally, Dr. Tenenbaum plays a leadership role in the American Medical Informatics Association, serving as Chair of the Genomics and Translational Bioinformatics Working Group and as an elected member of the Board of Directors. She is an Associate Editor for the Journal of Biomedical Informatics and serves on the advisory panel for Nature Publishing Group's Scientific Data initiative. After earning her bachelor"s degree in biology from Harvard, Dr. Tenenbaum worked as a program manager at Microsoft Corporation in Redmond, WA for six years before pursuing a PhD in biomedical informatics at Stanford University.