Mining MEDLINE: abstracts, sentences, or phrases?

Ding J, Berleant D, Nettleton D, Wurtele E

Department of Electrical and Computer Engineering, Iowa State University, Ames, Iowa 50011, USA.

Pac Symp Biocomput. 2002;:326-37.


A growing body of works address automated mining of biochemical knowledge from digital repositories of scientific literature, such as MEDLINE. Some of these works use abstracts as the unit of text from which to extract facts. Others use sentences for this purpose, while still others use phrases. Here we compare abstracts, sentences, and phrases in MEDLINE using the standard information retrieval performance measures of recall, precision, and effectiveness, for the task of mining interactions among biochemical terms based on term co-occurrence. Results show statistically significant differences that can impact the choice of text unit.

[Full-Text PDF] [PSB Home Page]