Loading…
Cache-oblivious index for approximate string matching
This paper revisits the problem of indexing a text for approximate string matching. Specifically, given a text T of length n and a positive integer k , we want to construct an index of T such that for any input pattern P , we can find all its k -error matches in T efficiently. This problem is well-s...
Saved in:
Published in: | Theoretical computer science 2011-07, Vol.412 (29), p.3579-3588 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | This paper revisits the problem of indexing a text for approximate string matching. Specifically, given a text
T
of length
n
and a positive integer
k
, we want to construct an index of
T
such that for any input pattern
P
, we can find all its
k
-error matches in
T
efficiently. This problem is well-studied in the internal-memory setting. Here, we extend some of these recent results to external-memory solutions, which are also cache-oblivious. Our first index occupies
O
(
(
n
log
k
n
)
/
B
)
disk pages and finds all
k
-error matches with
O
(
(
|
P
|
+
o
c
c
)
/
B
+
log
k
n
log
log
B
n
)
I/Os, where
B
denotes the number of words in a disk page. To the best of our knowledge, this index is the first external-memory data structure that does not require
Ω
(
|
P
|
+
o
c
c
+
poly
(
log
n
)
)
I/Os. The second index reduces the space to
O
(
(
n
log
n
)
/
B
)
disk pages, and the I/O complexity is
O
(
(
|
P
|
+
o
c
c
)
/
B
+
log
k
(
k
+
1
)
n
log
log
n
)
. |
---|---|
ISSN: | 0304-3975 1879-2294 |
DOI: | 10.1016/j.tcs.2011.03.004 |