Loading…
Efficient straggler task management in cloud environment using stochastic gradient descent with momentum learning-driven neural networks
In the modern era, large-scale computing systems distribute tasks into smaller units, allowing them to be executed simultaneously, accelerating job completion, and reducing energy usage. However, cloud computing systems face a significant challenge: the Long Tail problem. This problem arises when a...
Saved in:
Published in: | Cluster computing 2024-07, Vol.27 (4), p.4673-4685 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In the modern era, large-scale computing systems distribute tasks into smaller units, allowing them to be executed simultaneously, accelerating job completion, and reducing energy usage. However, cloud computing systems face a significant challenge: the Long Tail problem. This problem arises when a small subset of slow-performing tasks impedes the overall progress of parallel job execution, resulting in longer service response times and decreased system efficiency. To reduce task execution time and energy consumption, we propose an efficient straggler task management framework for cloud data centers in this paper. A neural network-based resource predictor is initially developed and tuned with the Stochastic Gradient Descent with Momentum mechanism to analyze and classify heterogeneous tasks into stragglers and non-stragglers. Then, after identifying the straggler tasks, they are further classified into two categories: Resource Hunters and Long-Tail stragglers, based on their specific resource requirements. A task management policy is implemented to achieve parallelism and enhance sustainability in the cloud infrastructure. Considering the task category, this policy effectively schedules and allocates resources among user job requests. To evaluate the effectiveness of the proposed work, extensive simulations are performed using the Google Cluster Dataset (GCD). The results obtained from these simulations are subsequently compared to state-of-the-art techniques for a comprehensive analysis. The experimental results reveal substantial improvements in various metrics, including power consumption and active servers showing reductions of up to 55.16% and 35%, respectively. Furthermore, there has been a reduction in execution time of up to 67.74%. |
---|---|
ISSN: | 1386-7857 1573-7543 |
DOI: | 10.1007/s10586-023-04191-8 |