Loading…

Exact Analyses of a Simple Heuristic Employed in Array Compression

We consider the classical static dictionary problem of storing a set $S$ of $x$ elements from Universe $[1,n]$ such that membership queries on elements of $S$ can be supported in worst case $O(1)$; time. This can be viewed as a table compression problem in which a table of size $n$ has $x$ ones and...

Full description

Saved in:
Bibliographic Details
Published in:Computer journal 2002-01, Vol.45 (2), p.213-220
Main Authors: Gibbons, Alan, Pu, Ida, Muthukrishnan, Muthu
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We consider the classical static dictionary problem of storing a set $S$ of $x$ elements from Universe $[1,n]$ such that membership queries on elements of $S$ can be supported in worst case $O(1)$; time. This can be viewed as a table compression problem in which a table of size $n$ has $x$ ones and $(n-x)$ zeros; in our applications, $x\ll n$, that is, the table is sparse. As is well known, perfect hashing provides an optimum solution to this problem, the space requirement being $O(x)$ with constant time access to any element. Apart from perfect hashing, many applications employ simpler heuristics for compression and rely on setting inherent parameters. Although these may be known to behave usefully, often their precise behaviour is ill-understood. The object of this paper is to precisely analyse the behaviour of one such extremely simple heuristic which is known to give modest compression in practice. For the heuristic we prove that the expected asymptotic space requirement is, at worst, $a(k)n+b(k)x$ and that although its dependency on $n$ is inherent, it can be made arbitrarily small. Here $k$ is a parameter and $a(k)$ and $b(k)$ are, respectively, monotonically decreasing and increasing functions. Thus $k$ allows a trade-off between dependency on $n$ and $x$; for example, pairs $(a(k), b(k))$ can be $(0.1,3.26)$, $(0.03,5.57)$ and $(6\times10^{-4},33)$. We also show that for some applications the dependency of the space requirement on $n$ can be made sublinear. The heuristic allows constant time access to any element. Our analyses are over two different models for the uniform probability distribution and we derive exact formulae for the expected space used. We prove that the heuristic gives the same asymptotic performance in both models.
ISSN:0010-4620
1460-2067
DOI:10.1093/comjnl/45.2.213