Loading…

Robustness challenges in Reinforcement Learning based time-critical cloud resource scheduling: A Meta-Learning based solution

Cloud computing attracts increasing attention in processing dynamic computing tasks and automating the software development and operation pipeline. In many cases, the computing tasks have strict deadlines. The cloud resource manager (e.g., orchestrator) effectively manages the resources and provides...

Full description

Saved in:
Bibliographic Details
Published in:Future generation computer systems 2023-09, Vol.146, p.18-33
Main Authors: Liu, Hongyun, Chen, Peng, Ouyang, Xue, Gao, Hui, Yan, Bing, Grosso, Paola, Zhao, Zhiming
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Cloud computing attracts increasing attention in processing dynamic computing tasks and automating the software development and operation pipeline. In many cases, the computing tasks have strict deadlines. The cloud resource manager (e.g., orchestrator) effectively manages the resources and provides tasks Quality of Service (QoS). Cloud task scheduling is tricky due to the dynamic nature of task workload and resource availability. Reinforcement Learning (RL) has attracted lots of research attention in scheduling. However, those RL-based approaches suffer from low scheduling performance robustness when the task workload and resource availability change, particularly when handling time-critical tasks. This paper focuses on both challenges of robustness and deadline guarantee among such RL, specifically Deep RL (DRL)-based scheduling approaches. We quantify the robustness measurements as the retraining time and investigate how to improve both robustness and deadline guarantee of DRL-based scheduling. We propose MLR-TC-DRLS, a practical, robust Meta Deep Reinforcement Learning-based scheduling solution to provide time-critical tasks deadline guarantee and fast adaptation under highly dynamic situations. We comprehensively evaluate MLR-TC-DRLS performance against RL-based and RL advanced variants-based scheduling approaches using real-world and synthetic data. The evaluations validate that our proposed approach improves the scheduling performance robustness of typical DRL variants scheduling approaches with 97%–98.5% deadline guarantees and 200%–500% faster adaptation. •Improving the robustness of Reinforcement Learning-based task scheduling.•Enhancing Reinforcement Learning-based scheduling.•Providing a Meta-Learning-based robust time-critical deep reinforcement learning scheduling (MLR-TC-DRLS) algorithm.
ISSN:0167-739X
1872-7115
DOI:10.1016/j.future.2023.03.029