Improving the Efficiency of Biomarker Identification Using Biological Knowledge

John H. Phan1, Qiqin Yin-Goen2, Andrew N. Young2, and May D. Wang1

1The Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, 313 Ferst Drive, Atlanta, GA 30332, USA; 2Department of Pathology and Laboratory Medicine, Emory University, Atlanta, GA 30322, USA

Pacific Symposium on Biocomputing 14:427-438(2009)


Identifying and validating biomarkers from high-throughput gene expression data is important for understanding and treating cancer. Typically, we identify candidate biomarkers as features that are differentially expressed between two or more classes of samples. Many feature selection metrics rely on ranking by some measure of differential expression. However, interpreting these results is difficult due to the large variety of existing algorithms and metrics, each of which may produce different results. Consequently, a feature ranking metric may work well on some datasets but perform considerably worse on others. We propose a method to choose an optimal feature ranking metric on an individual dataset basis. A metric is optimal if, for a particular dataset, it favorably ranks features that are known to be relevant biomarkers. Extensive knowledge of biomarker candidates is available in public databases and literature. Using this knowledge, we can choose a ranking metric that produces the most biologically meaningful results. In this paper, we first describe a framework for assessing the ability of a ranking metric to detect known relevant biomarkers. We then apply this method to clinical renal cancer microarray data to choose an optimal metric and identify several candidate biomarkers.

[Full-Text PDF] [PSB Home Page]