Loading…
SpaceSaving: an optimal algorithm for frequency estimation and frequent items in the bounded-deletion model
In this paper, we propose the first deterministic algorithms to solve the frequency estimation and frequent item problems in the bounded-deletion model. We establish the space lower bound for solving the deterministic frequent items problem in the bounded-deletion model, and propose Lazy SpaceSaving...
Saved in:
Published in: | Proceedings of the VLDB Endowment 2022-02, Vol.15 (6), p.1215-1227 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In this paper, we propose the first deterministic algorithms to solve the frequency estimation and frequent item problems in the
bounded-deletion
model. We establish the space lower bound for solving the deterministic frequent items problem in the bounded-deletion model, and propose Lazy SpaceSaving
±
and SpaceSaving
±
algorithms with optimal space bound. We develop an efficient implementation of the SpaceSaving
±
algorithm that minimizes the latency of update operations using novel data structures. The experimental evaluations testify that SpaceSaving
±
has accurate frequency estimations and achieves very high recall and precision across different data distributions while using minimal space. Our experiments clearly demonstrate that, if allowed the same space, SpaceSaving± is more accurate than the state-of-the-art protocols with up to
logU
- 1/
logU
of the items deleted, where
U
is the size of the input universe. Moreover, motivated by prior work, we propose Dyadic SpaceSaving
±
, the first deterministic quantile approximation sketch in the bounded-deletion model. |
---|---|
ISSN: | 2150-8097 2150-8097 |
DOI: | 10.14778/3514061.3514068 |