Loading…

Slark: A Performance Robust Decentralized Inter-datacenter Deadline-aware Coflows Scheduling Framework with Local Information

Inter-datacenter network applications generate massive coflows for purposes, e.g., backup, synchronization, and analytics, with deadline requirements. Decentralized coflow scheduling frameworks are desirable for their scalability in cross-domain deployment but grappling with the challenge of informa...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on parallel and distributed systems 2024-11, p.1-15
Main Authors: Dong, Xiaodong, Nie, Lihai, Liu, Zheli, Xiang, Yang
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Inter-datacenter network applications generate massive coflows for purposes, e.g., backup, synchronization, and analytics, with deadline requirements. Decentralized coflow scheduling frameworks are desirable for their scalability in cross-domain deployment but grappling with the challenge of information agnosticism for lack of cross-domain privileges. Current information-agnostic coflow scheduling methods are incompatible with decentralized frameworks for relying on centralized controllers to continuously monitor and learn from coflow global transmission states to infer global coflow information. Alternative methods propose mechanisms for decentralized global coflow information gathering and synchronization. However, they require dedicated physical hardware or control logic, which could be impractical for incremental deployment. This paper proposes Slark, a decentralized deadline-aware coflow scheduling framework, which meets coflows' soft and hard deadline requirements using only local traffic information. It eschews requiring global coflow transmission states and dedicated hardware or control logic by leveraging multiple software-implemented scheduling agents working independently on each node and integrating such information agnosticism into node-specific bandwidth allocation by modeling it as a robust optimization problem with flow information on the other nodes represented as uncertain parameters. Subsequently, we validate the performance robustness of Slark by investigating how perturbations in the optimal objective function value and the associated optimal solution are affected by uncertain parameters. Finally, we propose a firebug-swarm-optimization-based heuristic algorithm to tackle the non-convexity in our problem. Experimental results demonstrate that Slark can significantly enhance transmission revenue and increase soft and hard deadline guarantee ratios by 10.52% and 7.99% on average.
ISSN:1045-9219
DOI:10.1109/TPDS.2024.3508275