Mining the Pharmacogenomics Literature



The aim of this workshop is to bring together researchers working on the automatic or semi-automatic extraction of relationships between biomedical entities from research literature. The workshop will focus particularly on methods for the extraction of genotype-phenotype, genotype-drug, and phenotype-drug relationships and the use of the relationships for advancing pharmacogenomic research. Efforts aimed at creating benchmark corpora as well as comparative evaluation of existing relationship extraction methods are of special interest.


Pharmacogenomics is both a timely and important field. The promise that it holds for individualized medicine may be on the crest of realization due to technical advances like large SNP microarrays and analytical advances that allow us to predict beneficial, non-beneficial, and deleterious drugs for specific individuals based on aspects of both the individual and the drug.

However, information management in this field relies on fairly traditional means, e.g. curated databases, which do not scale to (1) the rapid expansion of the pharmacogenomics literature in recent years and (2) the increasingly available volume of full text publications, which contain more specific and (potentially) informative facts than Medline abstracts.   Hence, although there is a large demand and significant utility of text analytics to the study of pharmacogenomics, its potential is not fully realized; in part because the work to date has failed to bridge the two distinct worlds—that of (bench) molecular biology and that of (clinically oriented) pharmacology—and because the developers of text analytics are not fully aware of this challenging field.

Last year's workshop (Genotype-Phenotype-Drug relationship extraction from text) examined the current state-of–the-art and reported ongoing research of labs already involved in this area of research. The steady stream of work on extracting interactions from text, the increasing attention in the Semantic Web towards capturing facts as "nano-publications" (individual assertions that are attributable to authors and traceable in their publications), and representing scientific discourse in a structured manner, all indicate that the time seems to be ripe for research that goes even beyond the mere extraction of explicitly stated knowledge in documents, to linking text-mined and database data through formal reasoning to uncover implicit and in some sense "new" knowledge.

In order to advance this agenda, it is essential that existing relationship extraction methods be compared to one another and that a community-wide sharable benchmark corpus emerges against which such efforts can be compared. The goal of the workshop is to utilize a corpus put forth by PharmGKB to compare different relationship extraction methods and the corresponding "new" knowledge discovery they might drive.

This workshop aims to address the gap in coverage of text mining for pharmacogenomics. The technical area of the workshop is intended to particularly focus on genotype-phenotype-drug relationships.  It will include broad categories of work that have been well-studied in the past, specifically text mining and reasoning, but will restrict submissions to applications of that work to the constrained area of pharmacogenomics, and particularly genotype-phenotype-drug relationships.  For example, topics that are solicited include:

      Relation extraction between genotypes, phenotypes, and drugs, and other semantic classes relevant to pharmacogenomics

      Corpus development for pharmacogenomics text mining

      Associating gene variants (mutations, alleles, rs/ss numbers) to the associated gene name

      Work on the corpus of documents linked to by PharmGKB

      Reasoning systems applied over the PharmGKB knowledge base


Work on named entity recognition (e.g. gene taggers) would not be considered for inclusion. Approaches that combine text-mining and knowledge-based systems are of special interest.


We are soliciting both research and position abstracts (up to 500 words) related to the topics mentioned above. The workshop will combine invited talks, talks selected from abstract submissions to this call, and a panel discussion. Submitted abstracts that will be reviewed by the co-chairs for selecting submitted talks.  Authors of all accepted abstracts will be invited to submit full papers for publication in a yet-to-be-determined journal.


Please submit abstracts to with the subject line PSB workshop submission.


Abstract deadline:                   August 31, 2010

Speaker notification:               September 15, 2010

Workshop:                              TBA, but some time January 3-7, 2011


Kevin Bretonnel Cohen

Yael Garten

Udo Hahn

Nigam H. Shah