Loading…
Effective and efficient top-k query processing over incomplete data streams
•Formalize an important problem, top-k query over incomplete data streams (Topk-iDS).•Propose effective and efficient cost-model-based data imputation techniques.•Devise effective pruning strategies to reduce the Topk-iDS search space.•Design effective indexes and efficient algorithms to tackle the...
Saved in:
Published in: | Information sciences 2021-01, Vol.544, p.343-371 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | •Formalize an important problem, top-k query over incomplete data streams (Topk-iDS).•Propose effective and efficient cost-model-based data imputation techniques.•Devise effective pruning strategies to reduce the Topk-iDS search space.•Design effective indexes and efficient algorithms to tackle the Topk-iDS problem.•Conduct extensive experiments to show good performance of our Topk-iDS approach.
Nowadays, efficient and effective stream processing has become increasingly important in many real-world applications such as sensor data monitoring, network intrusion detection, IP network traffic analysis, and so on. In practice, stream data often encounter the problem of having some data attributes missing, due to reasons such as packet losses, network congestion/failure, and so on. In such a scenario, it is rather important, yet challenging, to accurately and efficiently monitor top-k objects over incomplete data stream, which may potentially indicate some dangerous and critical security events (e.g., fire, network intrusion, or denial-of-service attack). In this paper, we formally define the problem of top-k query over incomplete data stream (Topk-iDS), which continuously detects top-k objects with the highest ranking scores over an incomplete data stream. Due to unique characteristics such as incompleteness and stream processing, we propose a cost-model-based data imputation approach, design effective pruning strategies to reduce the Topk-iDS search space, and carefully devise dynamically updated data synopses to facilitate Topk-iDS query processing. We also propose an efficient algorithm to perform the data imputation and incremental Topk-iDS computation at the same time. Finally, through extensive experiments, we evaluate the efficiency and effectiveness of our proposed Topk-iDS query answering approach over both real and synthetic data sets.. |
---|---|
ISSN: | 0020-0255 1872-6291 |
DOI: | 10.1016/j.ins.2020.08.011 |