Study of effect of drug lexicons on medication extraction from electronic medical records

Sirohi E, Peissig P

Marshfield Clinic Research Foundation, Marshfield, WI 54449, USA. sirohi.ekta@marshfieldclinic.org

Pac Symp Biocomput. 2005;:308-18.


Abstract

Extraction of relevant information from free-text clinical notes is becoming increasingly important in healthcare to provide personalized care to patients. The purpose of this dictionary-based NLP study was to determine the effects of using varying drug lexicons to automatically extract medication information from electronic medical records. A convenience training sample of 52 documents, each containing at least one medication, and a randomized test sample of 100 documents were used in this study. The training and test set documents contained a total of 681 and 641 medications respectively. Three sets of drug lexicons were used as sources for medication extraction: first, containing drug name and generic name; second with drug, generic and short names; third with drug, generic and short names followed by filtering techniques. Extraction with the first drug lexicon resulted in 83.7% sensitivity and 96.2% specificity for the training set and 85.2% sensitivity and 96.9% specificity for the test set. Adding the list of short names used for drugs resulted in increasing sensitivity to 95.0%, but decreased the specificity to 79.2% for the training set. Similar results of increased sensitivity of 96.4% and 80.1% specificity were obtained for the test set. Combination of a set of filtering techniques with data from the second lexicon increased the specificity to 98.5% and 98.8% for the training and test sets respectively while slightly decreasing the sensitivity to 94.1% (training) and 95.8% (test). Overall, the lexicon with filtering resulted in the highest precision, i.e., extracted the highest number of medications while keeping the number of extracted non-medications low.


[Full-Text PDF] [PSB Home Page]