Loading…

Energy-aware dynamic response and efficient consolidation strategies for disaster survivability of cloud microservices architecture

Computer system resilience refers to the ability of a computer system to continue functioning even in the face of unexpected events or disruptions. These disruptions can be caused by a variety of factors, such as hardware failures, software glitches, cyber attacks, or even natural disasters. Modern...

Full description

Saved in:
Bibliographic Details
Published in:Computing 2024-08, Vol.106 (8), p.2737-2783
Main Authors: Fé, Iure, Nguyen, Tuan Anh, Mauro, Mario Di, Postiglione, Fabio, Ramos, Alex, Soares, André, Choi, Eunmi, Min, Dugki, Lee, Jae Woo, Silva, Francisco Airton
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Computer system resilience refers to the ability of a computer system to continue functioning even in the face of unexpected events or disruptions. These disruptions can be caused by a variety of factors, such as hardware failures, software glitches, cyber attacks, or even natural disasters. Modern computational environments need applications that can recover quickly from major disruptions while also being environmentally sustainable. Balancing system resilience with energy efficiency is challenging, as efforts to improve one can harm the other. This paper presents a method to enhance disaster survivability in microservice architectures, particularly those using Kubernetes in cloud-based environments, focusing on optimizing electrical energy use. Aiming to save energy, our work adopt the consolidation strategy that means grouping multiple microservices on a single host. Our aproach uses a widely adopted analytical model, the Generalized Stochastic Petri Net (GSPN). GSPN are a powerful modeling technique that is widely used in various fields, including engineering, computer science, and operations research. One of the primary advantages of GSPN is its ability to model complex systems with a high degree of accuracy. Additionally, GSPN allows for the modeling of both logical and stochastic behavior, making it ideal for systems that involve a combination of both. Our GSPN models compute a number of metrics such as: recovery time, system availability, reliability, Mean Time to Failure, and the configuration of cloud-based microservices. We compared our approach against others focusing on survivability or efficiency. Our approach aligns with Recovery Time Objectives during sudden disasters and offers the fastest recovery, requiring 9% less warning time to fully recover in cases of disaster with alert when compared to strategies with similar electrical consumption. It also saves about 27% energy compared to low consolidation strategies and 5% against high consolidation under static conditions.
ISSN:0010-485X
1436-5057
DOI:10.1007/s00607-024-01305-x