Loading…

Faster compressed dictionary matching

Given a set D of d patterns, the dictionary matching problem is to index D such that for any query text T, we can locate the occurrences of any pattern within T efficiently. When D contains a total of n characters drawn from an alphabet of size σ, Hon et al. (2008) [12] gave an nHk(D)+o(nlogσ)-bit i...

Full description

Saved in:
Bibliographic Details
Published in:Theoretical computer science 2013-03, Vol.475, p.113-119
Main Authors: Hon, Wing-Kai, Ku, Tsung-Han, Shah, Rahul, Thankachan, Sharma V., Vitter, Jeffrey Scott
Format: Article
Language:English
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Given a set D of d patterns, the dictionary matching problem is to index D such that for any query text T, we can locate the occurrences of any pattern within T efficiently. When D contains a total of n characters drawn from an alphabet of size σ, Hon et al. (2008) [12] gave an nHk(D)+o(nlogσ)-bit index which supports a query in O(|T|(logϵn+logd)+occ) time, where ϵ>0 and Hk(D) denotes the kth-order entropy of D. Very recently, Belazzougui (2010) [3] has proposed an elegant scheme, which takes nlogσ+O(n) bits of index space and supports a query in optimal O(|T|+occ) time. In this paper, we provide connections between Belazzougui’s index and the XBW compression of Ferragina and Manzini (2005) [8], and show that Belazzougui’s index can be slightly modified to be stored in nHk(D)+O(n) bits, while query time remains optimal; this improves the compressed index by Hon et al. (2008) [12] in both space and time.
ISSN:0304-3975
1879-2294
DOI:10.1016/j.tcs.2012.10.050