Evaluation of Linear Classifiers on Articles Containing Pharmacokinetic Evidence of Drug-Drug Interactions


Artemy Kolchinsky1, Analia Lourenco2, Lang Li3, Luis M. Rocha4



1School of Informatics and Computing, Indiana University;2Institute for Biotechnology & Bioengineering, Centre of Biological Engineering, University of Minho;3Department of Medical and Molecular Genetics, Indiana Univeristy School of Medicine;4School of Informatics and Computing, Indiana University
Email: akolchin@indiana.edu

Pacific Symposium on Biocomputing 18:409-420(2013)


Abstract

Background. Drug-drug interaction (DDI) is a major cause of morbidity and mortality. DDI re- search includes the study of di?erent aspects of drug interactions, from in vitro pharmacology, which deals with drug interaction mechanisms, to pharmaco-epidemiology, which investigates the e?ects of DDI on drug e?cacy and adverse drug reactions. Biomedical literature mining can aid both kinds of approaches by extracting relevant DDI signals from either the published literature or large clinical databases. However, though drug interaction is an ideal area for translational research, the inclusion of literature mining methodologies in DDI work?ows is still very preliminary. One area that can ben- e?t from literature mining is the automatic identi?cation of a large number of potential DDIs, whose pharmacological mechanisms and clinical signi?cance can then be studied via in vitro pharmacology and in populo pharmaco-epidemiology. Experiments. We implemented a set of classi?ers for identifying published articles relevant to experimental pharmacokinetic DDI evidence. These documents are important for identifying causal mechanisms behind putative drug-drug interactions, an important step in the extraction of large numbers of potential DDIs. We evaluate performance of several linear classi?ers on PubMed ab- stracts, under di?erent feature transformation and dimensionality reduction methods. In addition, we investigate the performance bene?ts of including various publicly-available named entity recog- nition features, as well as a set of internally-developed pharmacokinetic dictionaries. Results. We found that several classi?ers performed well in distinguishing relevant and irrele- vant abstracts. We found that the combination of unigram and bigram textual features gave better performance than unigram features alone, and also that normalization transforms that adjusted for feature frequency and document length improved classi?cation. For some classi?ers, such as linear discriminant analysis (LDA), proper dimensionality reduction had a large impact on performance. Finally, the inclusion of NER features and dictionaries was found not to help classi?cation.


[Full-Text PDF] [PSB Home Page]