Loading…
Collage system: a unifying framework for compressed pattern matching
We introduce a general framework which is suitable to capture the essence of compressed pattern matching according to various dictionary-based compressions. It is a formal system to represent a string by a pair of dictionary D and sequence S of phrases in D . The basic operations are concatenation,...
Saved in:
Published in: | Theoretical computer science 2003-04, Vol.298 (1), p.253-272 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | We introduce a general framework which is suitable to capture the essence of
compressed pattern matching according to various dictionary-based compressions. It is a formal system to represent a string by a pair of dictionary
D
and sequence
S
of phrases in
D
. The basic operations are concatenation, truncation, and repetition. We also propose a compressed pattern matching algorithm for the framework. The goal is to find all occurrences of a pattern in a text without decompression, which is one of the most active topics in string matching. Our framework includes such compression methods as Lempel–Ziv family (LZ77, LZSS, LZ78, LZW), RE-PAIR, SEQUITUR, and the static dictionary-based method. The proposed algorithm runs in
O((||
D||+|
S|)·height(
D)+m
2+r)
time with
O(||
D||+m
2)
space, where
||
D||
is the size of
D
,
|
S|
is the number of tokens in
S
,
height(
D)
is the maximum dependency of tokens in
D
,
m is the pattern length, and
r is the number of pattern occurrences. For a subclass of the framework that contains no truncation, the time complexity is
O(||
D||+|
S|+m
2+r)
. |
---|---|
ISSN: | 0304-3975 1879-2294 |
DOI: | 10.1016/S0304-3975(02)00426-7 |