Loading…

Cache-oblivious index for approximate string matching

This paper revisits the problem of indexing a text for approximate string matching. Specifically, given a text T of length n and a positive integer k , we want to construct an index of T such that for any input pattern P , we can find all its k -error matches in T efficiently. This problem is well-s...

Full description

Saved in:
Bibliographic Details
Published in:Theoretical computer science 2011-07, Vol.412 (29), p.3579-3588
Main Authors: Hon, Wing-Kai, Lam, Tak-Wah, Shah, Rahul, Tam, Siu-Lung, Vitter, Jeffrey Scott
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper revisits the problem of indexing a text for approximate string matching. Specifically, given a text T of length n and a positive integer k , we want to construct an index of T such that for any input pattern P , we can find all its k -error matches in T efficiently. This problem is well-studied in the internal-memory setting. Here, we extend some of these recent results to external-memory solutions, which are also cache-oblivious. Our first index occupies O ( ( n log k n ) / B ) disk pages and finds all k -error matches with O ( ( | P | + o c c ) / B + log k n log log B n ) I/Os, where B denotes the number of words in a disk page. To the best of our knowledge, this index is the first external-memory data structure that does not require Ω ( | P | + o c c + poly ( log n ) ) I/Os. The second index reduces the space to O ( ( n log n ) / B ) disk pages, and the I/O complexity is O ( ( | P | + o c c ) / B + log k ( k + 1 ) n log log n ) .
ISSN:0304-3975
1879-2294
DOI:10.1016/j.tcs.2011.03.004