Loading…

SpaceSaving: an optimal algorithm for frequency estimation and frequent items in the bounded-deletion model

In this paper, we propose the first deterministic algorithms to solve the frequency estimation and frequent item problems in the bounded-deletion model. We establish the space lower bound for solving the deterministic frequent items problem in the bounded-deletion model, and propose Lazy SpaceSaving...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings of the VLDB Endowment 2022-02, Vol.15 (6), p.1215-1227
Main Authors: Zhao, Fuheng, Agrawal, Divyakant, Abbadi, Amr El, Metwally, Ahmed
Format: Article
Language:English
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In this paper, we propose the first deterministic algorithms to solve the frequency estimation and frequent item problems in the bounded-deletion model. We establish the space lower bound for solving the deterministic frequent items problem in the bounded-deletion model, and propose Lazy SpaceSaving ± and SpaceSaving ± algorithms with optimal space bound. We develop an efficient implementation of the SpaceSaving ± algorithm that minimizes the latency of update operations using novel data structures. The experimental evaluations testify that SpaceSaving ± has accurate frequency estimations and achieves very high recall and precision across different data distributions while using minimal space. Our experiments clearly demonstrate that, if allowed the same space, SpaceSaving± is more accurate than the state-of-the-art protocols with up to logU - 1/ logU of the items deleted, where U is the size of the input universe. Moreover, motivated by prior work, we propose Dyadic SpaceSaving ± , the first deterministic quantile approximation sketch in the bounded-deletion model.
ISSN:2150-8097
2150-8097
DOI:10.14778/3514061.3514068