Loading…

Linearized Suffix Tree: an Efficient Index Data Structure with the Capabilities of Suffix Trees and Suffix Arrays

Suffix trees and suffix arrays are fundamental full-text index data structures to solve problems occurring in string processing. Since suffix trees and suffix arrays have different capabilities, some problems are solved more efficiently using suffix trees and others are solved more efficiently using...

Full description

Saved in:
Bibliographic Details
Published in:Algorithmica 2008-11, Vol.52 (3), p.350-377
Main Authors: Kim, Dong Kyue, Kim, Minhwan, Park, Heejin
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Suffix trees and suffix arrays are fundamental full-text index data structures to solve problems occurring in string processing. Since suffix trees and suffix arrays have different capabilities, some problems are solved more efficiently using suffix trees and others are solved more efficiently using suffix arrays. We consider efficient index data structures with the capabilities of both suffix trees and suffix arrays without requiring much space. When the size of an alphabet is small, enhanced suffix arrays are such index data structures. However, when the size of an alphabet is large, enhanced suffix arrays lose the power of suffix trees. Pattern searching in an enhanced suffix array takes O ( m |Σ|) time while pattern searching in a suffix tree takes O ( m log |Σ|) time where m is the length of a pattern and Σ is an alphabet. In this paper, we present linearized suffix trees which are efficient index data structures with the capabilities of both suffix trees and suffix arrays even when the size of an alphabet is large. A linearized suffix tree has all the functionalities of the enhanced suffix array and supports the pattern search in O ( m log |Σ|) time. In a different point of view, it can be considered a practical implementation of the suffix tree supporting O ( m log |Σ|)-time pattern search. In addition, we also present two efficient algorithms for computing suffix links on the enhanced suffix array and the linearized suffix tree. These are the first algorithms that run in O ( n ) time without using the range minima query. Our experimental results show that our algorithms are faster than the previous algorithms.
ISSN:0178-4617
1432-0541
DOI:10.1007/s00453-007-9061-2