Loading…
Linearized Suffix Tree: an Efficient Index Data Structure with the Capabilities of Suffix Trees and Suffix Arrays
Suffix trees and suffix arrays are fundamental full-text index data structures to solve problems occurring in string processing. Since suffix trees and suffix arrays have different capabilities, some problems are solved more efficiently using suffix trees and others are solved more efficiently using...
Saved in:
Published in: | Algorithmica 2008-11, Vol.52 (3), p.350-377 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Suffix trees and suffix arrays are fundamental full-text index data structures to solve problems occurring in string processing. Since suffix trees and suffix arrays have different capabilities, some problems are solved more efficiently using suffix trees and others are solved more efficiently using suffix arrays. We consider efficient index data structures with the capabilities of both suffix trees and suffix arrays without requiring much space. When the size of an alphabet is small, enhanced suffix arrays are such index data structures. However, when the size of an alphabet is large, enhanced suffix arrays lose the power of suffix trees. Pattern searching in an enhanced suffix array takes
O
(
m
|Σ|) time while pattern searching in a suffix tree takes
O
(
m
log |Σ|) time where
m
is the length of a pattern and Σ is an alphabet.
In this paper, we present
linearized suffix trees
which are efficient index data structures with the capabilities of both suffix trees and suffix arrays even when the size of an alphabet is large. A linearized suffix tree has all the functionalities of the enhanced suffix array and supports the pattern search in
O
(
m
log |Σ|) time. In a different point of view, it can be considered a practical implementation of the suffix tree supporting
O
(
m
log |Σ|)-time pattern search.
In addition, we also present two efficient algorithms for computing suffix links on the enhanced suffix array and the linearized suffix tree. These are the first algorithms that run in
O
(
n
) time without using the range minima query. Our experimental results show that our algorithms are faster than the previous algorithms. |
---|---|
ISSN: | 0178-4617 1432-0541 |
DOI: | 10.1007/s00453-007-9061-2 |