Incorporating Expert Terminology and Disease Risk Factors into Consumer Health Vocabularies

Michael Seedorff1, Kevin J. Peterson2, Laurie A. Nelsen3, Cristian Cocos4, Jennifer B. McCormick5, Christopher G. Chute6, Jyotishman Pathak7

1University of Iowa;2Department of Information Technology, Mayo Clinic;3Department of Information Technology, Mayo Clinic;4Department of Information Technology, Mayo Clinic;5Department of Information Technology, Mayo Clinic;6Department of Information Technology, Mayo Clinic;7Department of Information Technology, Mayo Clinic;
Email: michael-seedor?

Pacific Symposium on Biocomputing 18:421-432(2013)


It is well-known that the general health information seeking lay-person, regardless of his/her educa- tion, cultural background, and economic status, is not as familiar with—or comfortable using—the technical terms commonly used by healthcare professionals. One of the primary reasons for this is due to the di?erences in perspectives and understanding of the vocabulary used by patients and providers even when referring to the same health concept. To bridge this “knowledge gap,” consumer health vocabularies are presented as a solution. In this study, we introduce the Mayo Consumer Health Vocabulary (MCV)—a taxonomy of approximately 5,000 consumer health terms and concepts—and develop text-mining techniques to expand its coverage by integrating disease concepts (from UMLS) as well as non-genetic (from deCODEme) and genetic (from GeneWiki+ and PharmGKB) risk fac- tors to diseases. These steps led to adding at least one synonym for 97% of MCV concepts with an average of 43 consumer friendly terms per concept. We were also able to associate risk factors to 38 common diseases, as well as establish 5,361 Disease:Gene pairings. The expanded MCV provides a robust resource for facilitating online health information searching and retrieval as well as building consumer-oriented healthcare applications.

[Full-Text PDF] [PSB Home Page]