Loading…

Mining causality graph for automatic web-based service diagnosis

It is crucial for Internet company to provide highly reliable web-based services. The web-based services always have many components running in the large-scale infrastructure with complex interactions. As an indispensable part of high reliability, the diagnosis remains to be a thorny problem. With t...

Full description

Saved in:
Bibliographic Details
Main Authors: Xiaohui Nie, Youjian Zhao, Kaixin Sui, Dan Pei, Yu Chen, Xianping Qu
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:It is crucial for Internet company to provide highly reliable web-based services. The web-based services always have many components running in the large-scale infrastructure with complex interactions. As an indispensable part of high reliability, the diagnosis remains to be a thorny problem. With the growth of system scale and complexity, it becomes even more difficult. In this paper, we propose an automatic diagnosis system based on causality graph to help system operators find the root causes. The causality graph is mainly extracted from the historical data of the monitoring system, and the method consists of four steps. 1) It utilizes a data mining method to extract the initial causality graph. 2) Once a failure happens, it lists top-k suspects with a ranking algorithm based on the causality graph. 3) Then system operators check the suspects and label them either right or wrong. 4) A supervised learning algorithm takes the labels as the input to tune the causality graph, in order to improve the diagnosis accuracy on step 2 iteratively. This method requires neither knowledge about the design and implementation details of the web-based service, nor instrumenting the services' source code. Our controlled experiments show that the root causes can be ranked in top 3 with 100% accuracy after countable learning iterations.
ISSN:2374-9628
DOI:10.1109/PCCC.2016.7820614