Pash 2.0: Scaleable Sequence Anchoring for Next-Generation Sequencing Technologies

Cristian Coarfa, Aleksandar Milosavljevic


Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza Houston, Texas 77030, USA

Pac Symp Biocomput. 2008;:102-113.


Abstract

Many applications of next-generation sequencing technologies involve anchoring of a sequence fragment or a tag onto a corresponding position on a reference genome assembly. Positional Hashing method, implemented in the Pash 2.0 program, is specifically designed for the task of high-volume anchoring. In this article we present multi-diagonal gapped kmer collation and other improvements introduced in Pash 2.0 that further improve accuracy and speed of Positional Hashing. The goal of this article is to show that gapped kmer matching with cross-diagonal collation suffices for anchoring across close evolutionary distances and for the purpose of human resequencing. We propose a benchmark for evaluating the performance of anchoring programs that captures key parameters in specific applications, including duplicative structure of genomes of humans and other species. We demonstrate speedups of up to tenfold in large-scale anchoring experiments achieved by PASH 2.0 when compared to BLAT, another similarity search program frequently used for anchoring.


[Full-Text PDF] [PSB Home Page]