Loading…

Low-Latency Adaptive Distributed Stream Join System Based on a Flexible Join Model

Stream join is a fundamental operation in stream processing and has attracted extensive research due to its large resource consumption and serious impact on system performance. As the theoretical basis of stream join systems, the stream join model greatly affects system performance. State-of-the-art...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings of the ACM on management of data 2024-05, Vol.2 (3), p.1-27, Article 150
Main Authors: Wang, Qihang, Zuo, Decheng, Zhang, Zhan, Shu, Yanjun, Liu, Xin, He, Mingxuan
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Stream join is a fundamental operation in stream processing and has attracted extensive research due to its large resource consumption and serious impact on system performance. As the theoretical basis of stream join systems, the stream join model greatly affects system performance. State-of-the-art stream join models either consume too much computing resources or too much storage resources, thus resulting in lower throughput or higher latency. In this paper, we propose a new stream join model for processing arbitrary join predicates, called CoModel, which offers a flexible trade-off between memory and computing resource consumption. More importantly, CoModel can achieve the minimum sum of the number of store operations and join operations among all existing join models, and thus can achieve the lowest latency and highest throughput when the overheads associated with the local stream join for each input tuple are approximately constant. We give a trade-off strategy for CoModel and theoretically prove its performance advantages based on queuing theory. Furthermore, we design and implement an adaptive distributed stream join system, CoStream, based on CoModel. CoStream can adaptively adjust its structure according to resource constraints and statistics of input data. We conduct extensive experiments for CoStream to evaluate its performance and adaptivity, and the results show that CoStream has the lowest latency and highest throughput in various scenarios.
ISSN:2836-6573
2836-6573
DOI:10.1145/3654953