Loading…
A-RCL: An Adaptive Incident Prediction and Automated Root Cause Localization System for Cloud Environment
Cloud system is becoming increasingly complex to accommodate the growth of cloud services, especially in private cloud environments. In a mixed environment comprising containerized applications over virtual machines in physical host machines of cloud infrastructure, a single failure may simultaneous...
Saved in:
Main Authors: | , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Cloud system is becoming increasingly complex to accommodate the growth of cloud services, especially in private cloud environments. In a mixed environment comprising containerized applications over virtual machines in physical host machines of cloud infrastructure, a single failure may simultaneously cause multiple alarms in the cloud system. Therefore, root-cause localization is still a daunting task. In this paper, we propose an automated and real-time root cause localization system named ARCL with a multi-layer approach for monitoring and localizing system incidents. We present a mechanism to locate the root cause by combining predictive methods based on machine learning, which cover incidents in the system early and automatically perform root cause identification. We implement and evaluate A-RCL on a comprehensive real private cloud testbed. The evaluation demonstrates that A-RCL achieved high accuracy of 93,99% and 98,12% in incident prediction and root cause localization, respectively. |
---|---|
ISSN: | 2374-9709 |
DOI: | 10.1109/NOMS56928.2023.10154309 |