Loading…

Effective and efficient top-k query processing over incomplete data streams

•Formalize an important problem, top-k query over incomplete data streams (Topk-iDS).•Propose effective and efficient cost-model-based data imputation techniques.•Devise effective pruning strategies to reduce the Topk-iDS search space.•Design effective indexes and efficient algorithms to tackle the...

Full description

Saved in:
Bibliographic Details
Published in:Information sciences 2021-01, Vol.544, p.343-371
Main Authors: Ren, Weilong, Lian, Xiang, Ghazinour, Kambiz
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•Formalize an important problem, top-k query over incomplete data streams (Topk-iDS).•Propose effective and efficient cost-model-based data imputation techniques.•Devise effective pruning strategies to reduce the Topk-iDS search space.•Design effective indexes and efficient algorithms to tackle the Topk-iDS problem.•Conduct extensive experiments to show good performance of our Topk-iDS approach. Nowadays, efficient and effective stream processing has become increasingly important in many real-world applications such as sensor data monitoring, network intrusion detection, IP network traffic analysis, and so on. In practice, stream data often encounter the problem of having some data attributes missing, due to reasons such as packet losses, network congestion/failure, and so on. In such a scenario, it is rather important, yet challenging, to accurately and efficiently monitor top-k objects over incomplete data stream, which may potentially indicate some dangerous and critical security events (e.g., fire, network intrusion, or denial-of-service attack). In this paper, we formally define the problem of top-k query over incomplete data stream (Topk-iDS), which continuously detects top-k objects with the highest ranking scores over an incomplete data stream. Due to unique characteristics such as incompleteness and stream processing, we propose a cost-model-based data imputation approach, design effective pruning strategies to reduce the Topk-iDS search space, and carefully devise dynamically updated data synopses to facilitate Topk-iDS query processing. We also propose an efficient algorithm to perform the data imputation and incremental Topk-iDS computation at the same time. Finally, through extensive experiments, we evaluate the efficiency and effectiveness of our proposed Topk-iDS query answering approach over both real and synthetic data sets..
ISSN:0020-0255
1872-6291
DOI:10.1016/j.ins.2020.08.011