Loading…

Stochastic distributed data stream partitioning using task locality: design, implementation, and optimization

Distributed stream processing engines ( DSPEs ) provide stream partitioning methods for distributing messages to tasks deployed in the distributed environment for real-time stream processing. Among these methods, the original locality-aware stream partitioning ( LSP ) is a binary LSP that sends mess...

Full description

Saved in:

Bibliographic Details
Published in:	The Journal of supercomputing 2021, Vol.77 (10), p.11353-11389
Main Authors:	Son, Siwoon, Im, Hyeonseung, Moon, Yang-Sae
Format:	Article
Language:	English
Subjects:	Batch processing Compilers Computer Science Configuration management Data transmission Design optimization Engines Interpreters Messages Partitioning Performance degradation Processor Architectures Programming Languages
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Distributed stream processing engines ( DSPEs ) provide stream partitioning methods for distributing messages to tasks deployed in the distributed environment for real-time stream processing. Among these methods, the original locality-aware stream partitioning ( LSP ) is a binary LSP that sends messages only to downstreams on the same node as upstreams. The binary LSP degrades performance at general configurations because it focuses only on task locality and does not consider downstream status like distributed batch processing engines. In this paper, we propose a Stochastic LSP ( SLSP ) method that considers not only task locality but also downstream status by computing stream partitioning probability based on the round-trip time to downstreams. We also present coarse-grained and fine-grained methods for probing downstreams at node-level and process-level, respectively. Then, we optimize our SLSP using a weighted closeness to prioritize the partitioning probabilities and a parallel thread model to process each stage of the SLSP in parallel. Finally, we implement the SLSP in Apache Storm, a representative DSPE, and empirically evaluate it with the binary LSP. Experimental results show that our SLSP greatly reduces latency by up to 208% while maintaining a similar throughput compared to the binary LSP at general configurations. These results indicate that our SLSP performs the optimized stream partitioning by reflecting downstream status as well as task locality.
ISSN:	0920-8542 1573-0484
DOI:	10.1007/s11227-021-03725-4