TRELLIS+: An Effective Approach for Indexing Genome-Scale Sequences Using Suffix Trees

Benjarath Phoophakdee, Mohammed J. Zaki

Dept. of Computer Science, Rensselaer Polytechnic Institute, Troy, NY, 12180
Email: {phoopb,zaki}

Pac Symp Biocomput. 2008;:90-101.


With advances in high-throughput sequencing methods, and the corresponding exponential growth in sequence data, it has become critical to develop scalable data management techniques for sequence storage, retrieval and analysis. In this paper we present a novel disk-based suffix tree approach, called Trellis+, that effectively scales to massive amount of sequence data using only a limited amount of main-memory, based on a novel string buffering strategy. We show experimentally that Trellis+ outperforms existing suffix tree approaches; it is able to index genome-scale sequences (e.g., the entire Human genome), and it also allows rapid query processing over the disk-based index. Availability: TRELLIS+ source code is available online at

[Full-Text PDF] [PSB Home Page]