Loading…

Collage system: a unifying framework for compressed pattern matching

We introduce a general framework which is suitable to capture the essence of compressed pattern matching according to various dictionary-based compressions. It is a formal system to represent a string by a pair of dictionary D and sequence S of phrases in D . The basic operations are concatenation,...

Full description

Saved in:
Bibliographic Details
Published in:Theoretical computer science 2003-04, Vol.298 (1), p.253-272
Main Authors: Kida, Takuya, Matsumoto, Tetsuya, Shibata, Yusuke, Takeda, Masayuki, Shinohara, Ayumi, Arikawa, Setsuo
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We introduce a general framework which is suitable to capture the essence of compressed pattern matching according to various dictionary-based compressions. It is a formal system to represent a string by a pair of dictionary D and sequence S of phrases in D . The basic operations are concatenation, truncation, and repetition. We also propose a compressed pattern matching algorithm for the framework. The goal is to find all occurrences of a pattern in a text without decompression, which is one of the most active topics in string matching. Our framework includes such compression methods as Lempel–Ziv family (LZ77, LZSS, LZ78, LZW), RE-PAIR, SEQUITUR, and the static dictionary-based method. The proposed algorithm runs in O((|| D||+| S|)·height( D)+m 2+r) time with O(|| D||+m 2) space, where || D|| is the size of D , | S| is the number of tokens in S , height( D) is the maximum dependency of tokens in D , m is the pattern length, and r is the number of pattern occurrences. For a subclass of the framework that contains no truncation, the time complexity is O(|| D||+| S|+m 2+r) .
ISSN:0304-3975
1879-2294
DOI:10.1016/S0304-3975(02)00426-7