Loading…

A-RCL: An Adaptive Incident Prediction and Automated Root Cause Localization System for Cloud Environment

Cloud system is becoming increasingly complex to accommodate the growth of cloud services, especially in private cloud environments. In a mixed environment comprising containerized applications over virtual machines in physical host machines of cloud infrastructure, a single failure may simultaneous...

Full description

Saved in:
Bibliographic Details
Main Authors: Ta, Phuong Bac, Duong, Phung Ha, Kim, Younghan
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Cloud system is becoming increasingly complex to accommodate the growth of cloud services, especially in private cloud environments. In a mixed environment comprising containerized applications over virtual machines in physical host machines of cloud infrastructure, a single failure may simultaneously cause multiple alarms in the cloud system. Therefore, root-cause localization is still a daunting task. In this paper, we propose an automated and real-time root cause localization system named ARCL with a multi-layer approach for monitoring and localizing system incidents. We present a mechanism to locate the root cause by combining predictive methods based on machine learning, which cover incidents in the system early and automatically perform root cause identification. We implement and evaluate A-RCL on a comprehensive real private cloud testbed. The evaluation demonstrates that A-RCL achieved high accuracy of 93,99% and 98,12% in incident prediction and root cause localization, respectively.
ISSN:2374-9709
DOI:10.1109/NOMS56928.2023.10154309