PSB - Abstract

Identifying Transitional High Cost Users from Unstructured Patient Profiles Written by Primary Care Physicians

Haoran Zhang^1,2,3,*, Elisa Candido³, Andrew S. Wilton³, Raquel Duchen³, Liisa Jaakkimainen³, Walter Wodchis^3,4,5, Quaid Morris^1,2,6,7,*

¹Department of Computer Science, University of Toronto
²Vector Institute for Artificial Intelligence, Toronto
³ICES
⁴Institute of Health Policy, Management, and Evaluation, University of Toronto
⁵Institute for Better Health, Trillium Health Partners
⁶Terrence Donnelly Center for Cellular and Biomolecular Research, University of Toronto
⁷Department of Molecular Genetics, University of Toronto
^*Corresponding author
Email: haoran@cs.toronto.edu, quaid.morris@utoronto.ca

Pacific Symposium on Biocomputing 25:127-138(2020)

© 2020 World Scientific
Open Access chapter published by World Scientific Publishing Company and distributed under the terms of the Creative Commons Attribution (CC BY) 4.0 License.

Abstract

Identification and subsequent intervention of patients at risk of becoming High Cost Users (HCUs) presents the opportunity to improve outcomes while also providing significant savings for the healthcare system. In this paper, the 2016 HCU status of patients was predicted using free-form text data from the 2015 cumulative patient profiles within the electronic medical records of family care practices in Ontario. These unstructured notes make substantial use of domain-specific spellings and abbreviations; we show that word embeddings derived from the same context provide more informative features than pre-trained ones based on Wikipedia, MIMIC, and Pubmed. We further demonstrate that a model using features derived from aggregated word embeddings (EmbEncode) provides a significant performance improvement over the bag-of-words representation (82.48±0.35% versus 81.85±0.36% held-out AUROC, p = 3.2 x 10^-4), using far fewer input features (5,492 versus 214,750) and fewer non-zero coefficients (1,177 versus 4,284). The future HCUs of greatest interest are the transitional ones who are not already HCUs, because they provide the greatest scope for interventions. Predicting these new HCU is challenging because most HCUs recur. We show that removing recurrent HCUs from the training set improves the ability of EmbEncode to predict new HCUs, while only slightly decreasing its ability to predict recurrent ones.

[Full-Text PDF] [PSB Home Page]