Loading…

TraceStream: Anomalous Service Localization based on Trace Stream Clustering with Online Feedback

Modern large-scale service-based systems such as microservice systems have become increasingly complex, making it hard to localize anomalous services when various issues emerge. Traces record the workflows of requests through service instances and have been widely used in anomaly detection and root...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhou, Tong, Zhang, Chenxi, Peng, Xin, Yan, Zhenghui, Li, Pairui, Liang, Jianming, Zheng, Haibing, Zheng, Wujie, Deng, Yuetang
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Modern large-scale service-based systems such as microservice systems have become increasingly complex, making it hard to localize anomalous services when various issues emerge. Traces record the workflows of requests through service instances and have been widely used in anomaly detection and root cause analysis. Existing trace-based approaches widely use statistical methods or learning-based techniques to detect trace anomalies and localize anomalous services. However, these approaches often suffer from the concept drift problem, i.e., the statistical properties of traces change over time in unforeseen ways. In this paper, we propose TraceStream, an anomalous service localization approach based on trace data stream clustering. TraceStream uses data stream clustering to discover potential anomalous trace clusters in evolving trace data and uses spectrum analysis to localize anomalous services based on the clusters. Moreover, TraceStream can effectively incorporate the online feedback of operation engineers based on the trace clusters to improve the accuracy for localizing anomalous services. Our evaluation confirms that TraceStream can effectively detect anomalies and localize anomalous services in an evolving microservice system. It can effectively incorporate human feedback to further improve the performance of anomalous service localization. Moreover, TraceStream is efficient and its efficiency can be further improved by sampling a small portion of traces by cluster.
ISSN:2332-6549
DOI:10.1109/ISSRE59848.2023.00033