EpiLoc: A (Working) Text-Based System for Predicting Protein Subcellular Location

Scott Brady, Hagit Shatkay


School of Computing, Queen’s University, Kingston, Ontario, Canada K7L 3N6

Pac Symp Biocomput. 2008;:604-615.


Abstract

Motivation: Predicting the subcellular location of proteins is an active research area, as a protein’s location within the cell provides meaningful cues about its function. Several previous experiments in utilizing text for protein subcellular location prediction, varied in methods, applicability and performance level. In an earlier work we have used a preliminary text classification system and focused on the integration of text features into a sequence-based classifier to improve location prediction performance. Results: Here the focus shifts to the text-based component itself. We introduce EpiLoc, a comprehensive text-based localization system. We provide an in-depth study of textfeature selection, and study several new ways to associate text with proteins, so that textbased location prediction can be performed for practically any protein. We show that EpiLoc’s performance is comparable to (and may even exceed) that of state-of-the-art sequence-based systems. EpiLoc is available at: http://epiloc.cs.queensu.ca.


[Full-Text PDF] [PSB Home Page]