Loading…

A Chaos Recommendation Tool for Reliability Testing in Large-Scale Cloud-Native Systems

With the proliferation of cloud-native systems supported by container technology and the widespread deployment of 5G and Edge use-cases, modern applications have become increasingly distributed and complex, often consisting of hundreds of components. Ensuring the reliability of these workloads has g...

Full description

Saved in:
Bibliographic Details
Main Authors: Verma, Mudit, Hans, Sandeep, Saha, Diptikalyan, Jayachandran, Praveen, Farchi, Eitan, Chaitanya Elluri, Naga Ravi, Sebastiani, Tullio, Rubendall, Paige, Subramanian, Yogananth, Surisetty, Pradeep, Riordan, Brian
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:With the proliferation of cloud-native systems supported by container technology and the widespread deployment of 5G and Edge use-cases, modern applications have become increasingly distributed and complex, often consisting of hundreds of components. Ensuring the reliability of these workloads has grown increasingly intricate as a consequence, only further complicated by the continuous evolution of systems supported by CI/CD practices. In this context, Chaos Engineering can play a crucial role in assessing the reliability of these large-scale systems by intentionally introducing adverse conditions and gauging their resilience in inter-connected environments. This controlled approach enables organizations to identify and learn from potential failure points before they escalate into full-blown service degradation and production outages. Yet, the effectiveness of chaos testing hinges on the relevance of the targeted fault scenarios and often relies on arbitrary or intuitive fault injection practices, leading to inefficiencies and suboptimal outcomes. Addressing these challenges, we have developed a chaos-recommendation tool. This tool assesses the real-time behavior and characteristics of workloads and suggests fault injections that can cause disruptions. In this demo, we will illustrate how the Chaos recommendation tool can be used to automatically identify potential failure points for a system and suggest corresponding chaos test cases. This tool, part of Redhat's Chaos Engineering project Kraken, is open-source and available at: https://github.com/redhat-chaos/krkn/blob/main/utils/chaos_recommender/README.md
ISSN:2155-2509
DOI:10.1109/COMSNETS59351.2024.10427311