Merging Heterogeneous Data to Enable Knowledge Discovery

Contact Chair: Tina Hernandez-Boussard¹
Co-Chair: Michael Kahn ²

¹ Department of Medicine, (Biomedical Informatics) Stanford University
² Department of Pediatrics University of Colorado Denver Anschutz Medical Campus

Description:

The digitalization of high value information is generating measurements on dynamic processes, interactions, and systems that cross multiple orders of magnitude. We are beginning to see innovative results that have emerged by linking, integrating, and harmonizing such data and knowledge across previously independent data and knowledge sources. For this workshop we invite submissions that will highlight new results linking and integrating data and knowledge across heterogeneous sources (e.g. electronic medical records, geo-code data, genetic information, social media).

Motivation:

The "digitalization of everything" is generating digital measurements on dynamic processes, interactions, and systems that cross multiple orders of magnitude. Weber, Mandl, and Kohane (2016) describe an extensive existing data ecosystem they called the "Tapestry of high-value information sources". Wearable multi-sensors record continuous real-time measurements of personal biological processes and physiological responses while digital homes and sensors capture everyday environmental exposures. At the same time, a similar explosion is occurring in digital knowledge through electronic publication, large-scale ontology development and knowledge grids within a learning healthcare system. Collectively, these sources also capture knowledge on processes, interactions and systems across physical, temporal, and systems scales never before available. An aggressive effort by the open data community and funding agencies seek to ensure that these extensive digital assets are liquid, transparent, and linkable.

Call for Abstracts:

We invite researchers from different fields to present high impact research in these areas including (but not limited to):

Innovations in linking and integrating resources (data fusion), at all levels of the biological, clinical, and environmental/exposure scales.
Efforts to create novel data and knowledge assets that enable findings not possible with a single source.
Demonstrations of how integrated/linked data are more robust to data quality anomalies that could prevent discovery (missingness, bias, noisiness)
Following PSB2018 Workshop on Diversity and Disparity, we seek examples where combined data sets enabled the study of populations who are not adequately represented in medical research to gain from the research findings.
Development and use of novel methods and tools for data fusion, knowledge extraction, and visualization of disparate data.

Interested researchers please submit a one-page abstract of your presentation to boussard AT Stanford.edu by August 1, 2018. References and one figure are optional and do not contribute to page limit. Invitations for presentation will be sent out by August 15, 2018.

Contact: Tina Hernandez-Boussard
Email: boussard AT Stanford.edu