Loading…

Faster compressed dictionary matching

Given a set D of d patterns, the dictionary matching problem is to index D such that for any query text T, we can locate the occurrences of any pattern within T efficiently. When D contains a total of n characters drawn from an alphabet of size σ, Hon et al. (2008) [12] gave an nHk(D)+o(nlogσ)-bit i...

Full description

Saved in:

Bibliographic Details
Published in:	Theoretical computer science 2013-03, Vol.475, p.113-119
Main Authors:	Hon, Wing-Kai, Ku, Tsung-Han, Shah, Rahul, Thankachan, Sharma V., Vitter, Jeffrey Scott
Format:	Article
Language:	English
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Given a set D of d patterns, the dictionary matching problem is to index D such that for any query text T, we can locate the occurrences of any pattern within T efficiently. When D contains a total of n characters drawn from an alphabet of size σ, Hon et al. (2008) [12] gave an nHk(D)+o(nlogσ)-bit index which supports a query in O(\|T\|(logϵn+logd)+occ) time, where ϵ>0 and Hk(D) denotes the kth-order entropy of D. Very recently, Belazzougui (2010) [3] has proposed an elegant scheme, which takes nlogσ+O(n) bits of index space and supports a query in optimal O(\|T\|+occ) time. In this paper, we provide connections between Belazzougui’s index and the XBW compression of Ferragina and Manzini (2005) [8], and show that Belazzougui’s index can be slightly modified to be stored in nHk(D)+O(n) bits, while query time remains optimal; this improves the compressed index by Hon et al. (2008) [12] in both space and time.
ISSN:	0304-3975 1879-2294
DOI:	10.1016/j.tcs.2012.10.050