Merging heterogeneous clinical data to enable knowledge discovery

Martin G. Seneviratne1, Michael G. Kahn2, Tina Hernandez-Boussard3,*


1Department of Biomedical Data Science, Stanford University
2Colorado Clinical and Translational Sciences Institute
3Department of Medicine, Biomedical Informatics, Stanford University
*Corresponding author
Email: martsen@stanford.edu, michael.Kahn@ucdenver.edu, boussard@stanford.edu

Pacific Symposium on Biocomputing 24:439-443(2019)

© 2019 World Scientific
Open Access chapter published by World Scientific Publishing Company and distributed under the terms of the Creative Commons Attribution (CC BY) 4.0 License.


Abstract

The vision of precision medicine relies on the integration of large-scale clinical, molecular and environmental datasets. Data integration may be thought of along two axes: data fusion across institutions, and data fusion across modalities. Cross-institutional data sharing that maintains semantic integrity hinges on the adoption of data standards and a push toward ontology-driven integration. The goal should be the creation of query-able data repositories spanning primary and tertiary care providers, disease registries, research organizations etc. to produce rich longitudinal datasets. Cross-modality sharing involves the integration of multiple data streams, from structured EHR data (diagnosis codes, laboratory tests) to genomics, imaging, monitors and patient-generated data including wearable devices. This integration presents unique technical, semantic, and ethical challenges; however recent work suggests that multi-modal clinical data can significantly improve the performance of phenotyping and prediction algorithms, powering knowledge discovery at the patient- and population-level.


[Full-Text PDF] [PSB Home Page]