EDGAR: extraction of drugs, genes and relations from the biomedical literature

Rindflesch TC, Tanabe L, Weinstein JN, Hunter L

Lister Hill Center, National Library of Medicine, Bethesda, MD 20894, USA. tcr@lhc.nlm.nih.gov

Pac Symp Biocomput. 2000;:517-28.


Abstract

EDGAR (Extraction of Drugs, Genes and Relations) is a natural language processing system that extracts information about drugs and genes relevant to cancer from the biomedical literature. This automatically extracted information has remarkable potential to facilitate computational analysis in the molecular biology of cancer, and the technology is straightforwardly generalizable to many areas of biomedicine. This paper reports on the mechanisms for automatically generating such assertions and on a simple application, conceptual clustering of documents. The system uses a stochastic part of speech tagger, generates an underspecified syntactic parse and then uses semantic and pragmatic information to construct its assertions. The system builds on two important existing resources: the MEDLINE database of biomedical citations and abstracts and the Unified Medical Language System, which provides syntactic and semantic information about the terms found in biomedical abstracts.


[Full-Text PDF] [PSB Home Page]