Call For Papers

Linking biomedical information through text mining

A Pacific Symposium on Biocomputing Session
January 3-7, 2006
Grand Wailea Resort, Wailea, Maui, Hawai'i

Motivation|Submission requirements|Session chairs|Submission information

Motivation for this session

The past few years have seen a number of conference sessions and journal papers on the topic of biomedical language processing, including PSB sessions in 2000, 2001, 2002, and 2003. Most of the systems discussed in this body of work have identified entities or relations that are not grounded in any explicit external model of the world, but rather simply point to substrings in the input text. Such outputs are of intrinsically limited value. For example, a system that produces a table of protein-protein interactions is potentially highly valuable if it refers to specific entities in PDB, but of much more limited utility if it outputs only a list of potentially ambiguous symbols and names.

Concomitant with the past several years’ worth of work on mostly ungrounded language processing systems, the past few years have seen a considerable body of work on the linguistic and semantic characteristics of a variety of publicly available biomedical data sources, including gene names and Gene Ontology terms. Much of this work was presented at the PSB sessions on biomedical language processing listed above or at PSB sessions on ontologies in 2003, 2004, and 2005. However, to date there has been only limited work on bringing these two lines of pursuit together. The logical next step is to follow through on the insights that we have gained into the structure of available data sources and build language processing systems that can not only locate information in texts, but map it to these explicit knowledge models. Two recent competitive evaluation tasks from BioCreAtIvE (Critical Assessment of Information Extraction in Biology) showed that it is possible both to build and to perform principled evaluations of systems that produce grounded outputs. BioCreAtIvE Task 1(b) involved mapping references to genes in free text to specific LocusLink entries. BioCreAtIvE Task 2 involved assigning Gene Ontology terms to journal articles. Taken together, these two tasks demonstrate that it is possible to link the literature to specific entities and to specific concepts. At the same time, they make it clear that there is considerable room for improvement in performance of these tasks. This PSB session is intended to stimulate work in this area and to drive progress both in language processing and in the use and development of biological resources. It differs from previous PSB sessions on NLP and on ontologies in that it requires that submissions include both an NLP component and a mapping between at least two publicly available data sources.

Submission requirements

To encourage work that results in language processing systems whose output is grounded with respect to public databases, submissions to this session will be required to discuss work on some language processing system whose output includes links between specific entries in at least two publicly available biological data sources. We mean the term “biological” to exclude text collections—for example, MEDLINE/PubMed can be the source of linking assertions, but does not count as a biological data source. The prototypical submission would be one describing work that uses MEDLINE/PubMed literature to cross-connect two biological data sources, such as a system that assigns Gene Ontology terms to LocusLink entries based on processing of MEDLINE/PubMed abstracts, or one that links LocusLink entries to OMIM entries in the same way. We will also accept submissions that are data-source-centric and do not involve mining MEDLINE/PubMed as an intermediary data source, such as a paper on locating Gene Ontology terms in OMIM gene function fields to link GO and OMIM. However, all submissions must have a clear language processing component and must establish clear connections between two or more publicly available biological data sources.

We will define “publicly available” broadly. At least all NCBI-sponsored data sources will qualify, as well as the biomedical vocabularies integrated in the Unified Medical Language System (UMLS) Metathesaurus. We anticipate some, but not all, potential sources to be:

Session chairs

Submission information

Papers and posters

The core of PSB consists of rigorously peer-reviewed full-length papers reporting on original work. Accepted papers will be published in a cloth-bound archival proceedings volume, indexed by MEDLINE/PubMed. The best of these will be presented orally in plenary session.

Researchers wishing to present their research without official publication are encouraged to submit a one-page abstract for the poster session.

Important dates

Paper format

All papers must be submitted to Russ Altman in PostScript (.ps), Adobe Acrobat (.pdf), or Microsoft Word (.doc) format. Adobe Acrobat is preferred. Attached files should be named with the last name of the first author (e.g. altman.ps, altman.pdf, or altman.doc). Hardcopy submissions or unprocessed TEX or LATEX files will be rejected without review.

Every paper must be accompanied by a cover letter which must include the following:

Submitted papers are limited to twelve (12) pages in the PSB publication format. Please format your paper according to the instructions found at http://psb.stanford.edu/psb-online/psb-submit/. If figures cannot easily be resized and placed precisely in the text, then it should be clear that with appropriate modifications, the total manuscript length would be within the page limit.

Color pictures can be printed at the expense of the authors. The fee is $500 per page of color pictures, payable at the time of camera-ready submission.

Contact Russ Altman (russ.altman@stanford.edu) for additional information about paper submission requirements.