Multiple Approaches to Fine-Grained Indexing of the Biomedical Literature

Neveol A, Shooshan SE, Humphrey SM, Rindflesh TC, Aronson AR

National Library of Medicine, NIH Bethesda, MD 20894, USA
Equipe CISMeF, Rouen, France

Pac Symp Biocomput. 2007;:292-303.


The number of articles in the MEDLINE database is expected to increase tremendously in the coming years. To ensure that all these documents are indexed with continuing high quality, it is necessary to develop tools and methods that help the indexers in their daily task. We present three methods addressing a novel aspect of automatic indexing of the biomedical literature, namely producing MeSH main heading/subheading pair recommendations. The methods, (dictionary-based, post- processing rules and Natural Language Processing rules) are described and evaluated on a genetics-related corpus. The best overall performance is obtained for the subheading genetics (70% precision and 17% recall with post-processing rules, 48% precision and 37% recall with the dictionarybased method). Future work will address extending this work to all MeSH subheadings and a more thorough study of method combination.

[Full-Text PDF] [PSB Home Page]