Learning Contextual Hierarchical Structure of Medical Concepts with Poincairé Embeddings to Clarify Phenotypes

Brett K. Beaulieu-Jones, Isaac S. Kohane, Andrew L. Beam*


Department of Biomedical Informatics, Harvard Medical School
*Corresponding author
Email: Andrew_Beam@hms.harvard.edu

Pacific Symposium on Biocomputing 24:8-17(2019)

© 2019 World Scientific
Open Access chapter published by World Scientific Publishing Company and distributed under the terms of the Creative Commons Attribution (CC BY) 4.0 License.


Abstract

Biomedical association studies are increasingly done using clinical concepts, and in particu- lar diagnostic codes from clinical data repositories as phenotypes. Clinical concepts can be represented in a meaningful, vector space using word embedding models. These embeddings allow for comparison between clinical concepts or for straightforward input to machine learn- ing models. Using traditional approaches, good representations require high dimensionality, making downstream tasks such as visualization more difficult. We applied Poincaré embed- dings in a 2-dimensional hyperbolic space to a large-scale administrative claims database and show performance comparable to 100-dimensional embeddings in a euclidean space. We then examine disease relationships under different disease contexts to better understand potential phenotypes.


[Full-Text PDF] [PSB Home Page]