TRELLIS+: An Effective Approach for Indexing Genome-Scale Sequences Using Suffix Trees


Benjarath Phoophakdee, Mohammed J. Zaki

Dept. of Computer Science, Rensselaer Polytechnic Institute, Troy, NY, 12180
Email: {phoopb,zaki}@cs.rpi.edu


Pac Symp Biocomput. 2008;:90-101.


Abstract

With advances in high-throughput sequencing methods, and the corresponding exponential growth in sequence data, it has become critical to develop scalable data management techniques for sequence storage, retrieval and analysis. In this paper we present a novel disk-based suffix tree approach, called Trellis+, that effectively scales to massive amount of sequence data using only a limited amount of main-memory, based on a novel string buffering strategy. We show experimentally that Trellis+ outperforms existing suffix tree approaches; it is able to index genome-scale sequences (e.g., the entire Human genome), and it also allows rapid query processing over the disk-based index. Availability: TRELLIS+ source code is available online at http://www.cs.rpi.edu/zaki/software/trellis


[Full-Text PDF] [PSB Home Page]