Loading…

Two-stage scheduling for a fluctuant big data stream on heterogeneous servers with multicores in a data center

Rapid processing with low-latency and high-throughput is a critical requirement for the applications of big data streams. However, the interferences among stream processing tasks in a data center decrease the utilization of the computational resources and prolong the latency of the tasks. Thus, we s...

Full description

Saved in:
Bibliographic Details
Published in:Cluster computing 2024-04, Vol.27 (2), p.1581-1597
Main Authors: Wang, Shun, Zeng, Guo-sun
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Rapid processing with low-latency and high-throughput is a critical requirement for the applications of big data streams. However, the interferences among stream processing tasks in a data center decrease the utilization of the computational resources and prolong the latency of the tasks. Thus, we study an optimal scheduling method for processing a big data stream on heterogeneous servers with multicores in a data center. We model the big data stream processing and the scheduling problem with four objects or factors which are streaming data items, processing tasks, computational nodes and the cores inside each computational node. An interference model based on regression analysis and a prediction model based on the Autoregressive Integrated Moving Average are presented. Then, we propose a two-stage scheduling method including the fine-grained core scheduling and the coarse-grained node scheduling. In the core scheduling stage, we design a core scheduling algorithm named CS_TDF. In the node scheduling stage, we design a node scheduling algorithm named NS_ITF for a single time window and a continuous scheduling algorithm named PS_UIM for the entire data stream in all time windows. The experimental results show that our scheduling method achieves low interference and high computational resource utilization.
ISSN:1386-7857
1573-7543
DOI:10.1007/s10586-023-04044-4