Loading…

CauseInfer: Automatic and distributed performance diagnosis with hierarchical causality graph in large distributed systems

Modern applications especially cloud-based or cloud-centric applications always have many components running in the large distributed environment with complex interactions. They are vulnerable to suffer from performance or availability problems due to the highly dynamic runtime environment such as r...

Full description

Saved in:
Bibliographic Details
Main Authors: Pengfei Chen, Yong Qi, Pengfei Zheng, Di Hou
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Modern applications especially cloud-based or cloud-centric applications always have many components running in the large distributed environment with complex interactions. They are vulnerable to suffer from performance or availability problems due to the highly dynamic runtime environment such as resource hogs, configuration changes and software bugs. In order to make efficient software maintenance and provide some hints to software bugs, we build a system named CauseInfer, a low cost and blackbox cause inference system without instrumenting the application source code. CauseInfer can automatically construct a two layered hierarchical causality graph and infer the causes of performance problems along the causal paths in the graph with a series of statistical methods. According to the experimental evaluation in the controlled environment, we find out CauseInfer can achieve an average 80% precision and 85% recall in a list of top two causes to identify the root causes, higher than several state-of-the-art methods and a good scalability to scale up in the distributed systems.
ISSN:0743-166X
2641-9874
DOI:10.1109/INFOCOM.2014.6848128