Methods for Examining Data Quality in Healthcare Integrated Data Repositories

Co-organizers: Vojtech Huser, Michael Kahn, and Jeffrey Brown

Large Integrated Data Repositories (IDRs) have become indispensable for clinical research. Recent emergence of Common Data Models (CDMs) facilitated creation of tools that provide syntactic integration (shared information model) and in some cases also semantic integration (shared set of target terminologies used by structured data). Retrospective data analyses are increasingly being executed on multiple datasets, and distributed research networks are creating reusable tools that streamline data wrangling, data repository maintenance, and data analytics. Examples of large, well-coordinated IDRs developed using a CDM and and distributed network approach include the Vaccine Safety Datalink (9 sites focused on vaccine safety), the Health Care Systems Research Network (multi-purpose research network of 18 sites), the FDA Sentinel Initiative (18 sites representing billions of medical encounters to support medical product safety surveillance), MDPHnet (public health surveillance network in Massachusetts), and PCORnet (over 70 sites with millions of encounters to support clinical research). Each of these distributed networks has a unique approach to addressing data quality, including some shared approaches, and each has developed tools to facilitate data quality querying. However, the various data quality approaches, and tools, of these networks have not all been well-documented and\or are not readily available or easily usable by others. Development of well-documented and readily-available data quality software tools and methods is an emerging need to support use in the new data environments being developed to support clinical research. For example, the Achilles tool created by the OHDSI (Observational Health Data Sciences and Informatics) consortium allows sites that converted their data to the Observational Medical Outcomes Partnership (OMOP) CDM to readily execute a set of data quality rules and data characterization pre-computations. OHDSI supports active open community engagement in developing new tools or adding new functionality to existing tools which has enabled the Achilles tool to expand in data quality assessment capabilities based on community needs and interests.

Data Quality Topics in Scope

The workshop goal is to exchange novel approaches for evaluating data quality (DQ) and innovative ways of reporting DQ findings in a standardized, readily accessible format across multiple data partners. Specific topics are:

Contact: Vojtech Huser
Email: vojtech dot huser at nih dot gov