Loading…
Exact Analyses of a Simple Heuristic Employed in Array Compression
We consider the classical static dictionary problem of storing a set $S$ of $x$ elements from Universe $[1,n]$ such that membership queries on elements of $S$ can be supported in worst case $O(1)$; time. This can be viewed as a table compression problem in which a table of size $n$ has $x$ ones and...
Saved in:
Published in: | Computer journal 2002-01, Vol.45 (2), p.213-220 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | We consider the classical static dictionary problem of storing a set $S$ of $x$ elements from Universe $[1,n]$ such that membership queries on elements of $S$ can be supported in worst case $O(1)$; time. This can be viewed as a table compression problem in which a table of size $n$ has $x$ ones and $(n-x)$ zeros; in our applications, $x\ll n$, that is, the table is sparse. As is well known, perfect hashing provides an optimum solution to this problem, the space requirement being $O(x)$ with constant time access to any element. Apart from perfect hashing, many applications employ simpler heuristics for compression and rely on setting inherent parameters. Although these may be known to behave usefully, often their precise behaviour is ill-understood. The object of this paper is to precisely analyse the behaviour of one such extremely simple heuristic which is known to give modest compression in practice. For the heuristic we prove that the expected asymptotic space requirement is, at worst, $a(k)n+b(k)x$ and that although its dependency on $n$ is inherent, it can be made arbitrarily small. Here $k$ is a parameter and $a(k)$ and $b(k)$ are, respectively, monotonically decreasing and increasing functions. Thus $k$ allows a trade-off between dependency on $n$ and $x$; for example, pairs $(a(k), b(k))$ can be $(0.1,3.26)$, $(0.03,5.57)$ and $(6\times10^{-4},33)$. We also show that for some applications the dependency of the space requirement on $n$ can be made sublinear. The heuristic allows constant time access to any element. Our analyses are over two different models for the uniform probability distribution and we derive exact formulae for the expected space used. We prove that the heuristic gives the same asymptotic performance in both models. |
---|---|
ISSN: | 0010-4620 1460-2067 |
DOI: | 10.1093/comjnl/45.2.213 |