Extraction of Gene-Disease Relations from Medline Using Domain Dictionaries and Machine Learning

Chun HW, Tsuruoka Y, Kim JD, Shiba R, Nagata N, Hishiki T, Tsujii J

Tsujii Laboratory, Room 615, 7th Building of Science, University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo, 113-0033, Japan
E-mail: {chun,tsuruoka,jdkim,tsujii}@is.s.u-tokyo.ac.jp, {rshiba,nnagata,t-hishiki}@jbirc.aist.go.jp


Pac Symp Biocomput. 2006;:4-15.


Abstract

We describe a system that extracts disease-gene relations from MedLine. We constructed a dictionary for disease and gene names from six public databases and extracted relation candidates by dictionary matching. Since dictionary matching produces a large number of false positives, we developed a method of machine learning-based named entity recognition (NER) to filter out false recognitions of disease/gene names. We found that the performance of relation extraction is heavily dependent upon the performance of NER filtering and that the filtering improves the precision of relation extraction by 26.7% at the cost of a small reduction in recall.


[Full-Text PDF] [PSB Home Page]