Loading…

Efficient and Robust Trace Anomaly Detection for Large-Scale Microservice Systems

Microservice invocation anomalies can have a detrimental impact on user experience and service revenue. While existing trace anomaly detection approaches typically focus on anomalies in response time and invocation structure, they often overlook the importance of using fine-grained features to detec...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhang, Shenglin, Pan, Zhongjie, Liu, Heng, Jin, Pengxiang, Sun, Yongqian, Ouyang, Qianyu, Wang, Jiaju, Jia, Xueying, Zhang, Yuzhi, Yang, Hui, Zou, Yongqiang, Pei, Dan
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 79
container_issue
container_start_page 69
container_title
container_volume
creator Zhang, Shenglin
Pan, Zhongjie
Liu, Heng
Jin, Pengxiang
Sun, Yongqian
Ouyang, Qianyu
Wang, Jiaju
Jia, Xueying
Zhang, Yuzhi
Yang, Hui
Zou, Yongqiang
Pei, Dan
description Microservice invocation anomalies can have a detrimental impact on user experience and service revenue. While existing trace anomaly detection approaches typically focus on anomalies in response time and invocation structure, they often overlook the importance of using fine-grained features to detect anomalies. Additionally, trace data obtained from real-world scenarios is typically accompanied by noise, which can hinder the effectiveness of anomaly detection approaches. Furthermore, large-scale trace data can significantly impact model training efficiency. To address these challenges, we propose TraceSieve, an unsupervised trace anomaly detection method that accurately detects trace anomalies. Our approach leverages an auto-encoder architecture within an adversarial training framework to filter out noise data. Additionally, we integrate VGAE-EWC, which combines Variational Graph Auto-Encoder (VGAE) with Elastic Weight Consolidation (EWC), to overcome the challenges of enormous time consumption during the training phase. Finally, we localize the root cause of trace anomalies. Our proposed method is evaluated using two different datasets, and our results demonstrate that TraceSieve achieves an F 1 -score of 0.970 and 0.925, respectively, outperforming state-of-the-art trace anomaly detection approaches.
doi_str_mv 10.1109/ISSRE59848.2023.00012
format conference_proceeding
fullrecord <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_10301258</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10301258</ieee_id><sourcerecordid>10301258</sourcerecordid><originalsourceid>FETCH-LOGICAL-i204t-8521c765a2107fa12ecc504b6fabdafc4cb737bd9084070c96eb38caf997beae3</originalsourceid><addsrcrecordid>eNotkM1KAzEURqMgWGvfQCEvMPXmb5IsSx1roSJ26rokmRuJtDOSjELf3oKuvs3hHPgIuWcwZwzsw7ptt42yRpo5By7mAMD4BZlZbY1QIJiyUlySCReCV7WS9prclPIJwEEyPiFvTYwpJOxH6vqObgf_XUa6yy4gXfTD0R1O9BFHDGMaehqHTDcuf2DVBndA-pJCHgrmn3TG21MZ8VhuyVV0h4Kz_52S96dmt3yuNq-r9XKxqdI5PVZGcRZ0rRxnoKNjHENQIH0dne9cDDJ4LbTvLBgJGoKt0QsTXLRWe3QopuTuz5sQcf-V09Hl056BOB-gjPgFDiFRYw</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Efficient and Robust Trace Anomaly Detection for Large-Scale Microservice Systems</title><source>IEEE Xplore All Conference Series</source><creator>Zhang, Shenglin ; Pan, Zhongjie ; Liu, Heng ; Jin, Pengxiang ; Sun, Yongqian ; Ouyang, Qianyu ; Wang, Jiaju ; Jia, Xueying ; Zhang, Yuzhi ; Yang, Hui ; Zou, Yongqiang ; Pei, Dan</creator><creatorcontrib>Zhang, Shenglin ; Pan, Zhongjie ; Liu, Heng ; Jin, Pengxiang ; Sun, Yongqian ; Ouyang, Qianyu ; Wang, Jiaju ; Jia, Xueying ; Zhang, Yuzhi ; Yang, Hui ; Zou, Yongqiang ; Pei, Dan</creatorcontrib><description>Microservice invocation anomalies can have a detrimental impact on user experience and service revenue. While existing trace anomaly detection approaches typically focus on anomalies in response time and invocation structure, they often overlook the importance of using fine-grained features to detect anomalies. Additionally, trace data obtained from real-world scenarios is typically accompanied by noise, which can hinder the effectiveness of anomaly detection approaches. Furthermore, large-scale trace data can significantly impact model training efficiency. To address these challenges, we propose TraceSieve, an unsupervised trace anomaly detection method that accurately detects trace anomalies. Our approach leverages an auto-encoder architecture within an adversarial training framework to filter out noise data. Additionally, we integrate VGAE-EWC, which combines Variational Graph Auto-Encoder (VGAE) with Elastic Weight Consolidation (EWC), to overcome the challenges of enormous time consumption during the training phase. Finally, we localize the root cause of trace anomalies. Our proposed method is evaluated using two different datasets, and our results demonstrate that TraceSieve achieves an F 1 -score of 0.970 and 0.925, respectively, outperforming state-of-the-art trace anomaly detection approaches.</description><identifier>EISSN: 2332-6549</identifier><identifier>EISBN: 9798350315943</identifier><identifier>DOI: 10.1109/ISSRE59848.2023.00012</identifier><identifier>CODEN: IEEPAD</identifier><language>eng</language><publisher>IEEE</publisher><subject>failure detection ; Feature extraction ; Location awareness ; microservice ; Microservice architectures ; Noise reduction ; Production ; trace ; Training ; User experience</subject><ispartof>2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE), 2023, p.69-79</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10301258$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,27924,54554,54931</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10301258$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Zhang, Shenglin</creatorcontrib><creatorcontrib>Pan, Zhongjie</creatorcontrib><creatorcontrib>Liu, Heng</creatorcontrib><creatorcontrib>Jin, Pengxiang</creatorcontrib><creatorcontrib>Sun, Yongqian</creatorcontrib><creatorcontrib>Ouyang, Qianyu</creatorcontrib><creatorcontrib>Wang, Jiaju</creatorcontrib><creatorcontrib>Jia, Xueying</creatorcontrib><creatorcontrib>Zhang, Yuzhi</creatorcontrib><creatorcontrib>Yang, Hui</creatorcontrib><creatorcontrib>Zou, Yongqiang</creatorcontrib><creatorcontrib>Pei, Dan</creatorcontrib><title>Efficient and Robust Trace Anomaly Detection for Large-Scale Microservice Systems</title><title>2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE)</title><addtitle>ISSRE</addtitle><description>Microservice invocation anomalies can have a detrimental impact on user experience and service revenue. While existing trace anomaly detection approaches typically focus on anomalies in response time and invocation structure, they often overlook the importance of using fine-grained features to detect anomalies. Additionally, trace data obtained from real-world scenarios is typically accompanied by noise, which can hinder the effectiveness of anomaly detection approaches. Furthermore, large-scale trace data can significantly impact model training efficiency. To address these challenges, we propose TraceSieve, an unsupervised trace anomaly detection method that accurately detects trace anomalies. Our approach leverages an auto-encoder architecture within an adversarial training framework to filter out noise data. Additionally, we integrate VGAE-EWC, which combines Variational Graph Auto-Encoder (VGAE) with Elastic Weight Consolidation (EWC), to overcome the challenges of enormous time consumption during the training phase. Finally, we localize the root cause of trace anomalies. Our proposed method is evaluated using two different datasets, and our results demonstrate that TraceSieve achieves an F 1 -score of 0.970 and 0.925, respectively, outperforming state-of-the-art trace anomaly detection approaches.</description><subject>failure detection</subject><subject>Feature extraction</subject><subject>Location awareness</subject><subject>microservice</subject><subject>Microservice architectures</subject><subject>Noise reduction</subject><subject>Production</subject><subject>trace</subject><subject>Training</subject><subject>User experience</subject><issn>2332-6549</issn><isbn>9798350315943</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2023</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotkM1KAzEURqMgWGvfQCEvMPXmb5IsSx1roSJ26rokmRuJtDOSjELf3oKuvs3hHPgIuWcwZwzsw7ptt42yRpo5By7mAMD4BZlZbY1QIJiyUlySCReCV7WS9prclPIJwEEyPiFvTYwpJOxH6vqObgf_XUa6yy4gXfTD0R1O9BFHDGMaehqHTDcuf2DVBndA-pJCHgrmn3TG21MZ8VhuyVV0h4Kz_52S96dmt3yuNq-r9XKxqdI5PVZGcRZ0rRxnoKNjHENQIH0dne9cDDJ4LbTvLBgJGoKt0QsTXLRWe3QopuTuz5sQcf-V09Hl056BOB-gjPgFDiFRYw</recordid><startdate>20231009</startdate><enddate>20231009</enddate><creator>Zhang, Shenglin</creator><creator>Pan, Zhongjie</creator><creator>Liu, Heng</creator><creator>Jin, Pengxiang</creator><creator>Sun, Yongqian</creator><creator>Ouyang, Qianyu</creator><creator>Wang, Jiaju</creator><creator>Jia, Xueying</creator><creator>Zhang, Yuzhi</creator><creator>Yang, Hui</creator><creator>Zou, Yongqiang</creator><creator>Pei, Dan</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>20231009</creationdate><title>Efficient and Robust Trace Anomaly Detection for Large-Scale Microservice Systems</title><author>Zhang, Shenglin ; Pan, Zhongjie ; Liu, Heng ; Jin, Pengxiang ; Sun, Yongqian ; Ouyang, Qianyu ; Wang, Jiaju ; Jia, Xueying ; Zhang, Yuzhi ; Yang, Hui ; Zou, Yongqiang ; Pei, Dan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i204t-8521c765a2107fa12ecc504b6fabdafc4cb737bd9084070c96eb38caf997beae3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2023</creationdate><topic>failure detection</topic><topic>Feature extraction</topic><topic>Location awareness</topic><topic>microservice</topic><topic>Microservice architectures</topic><topic>Noise reduction</topic><topic>Production</topic><topic>trace</topic><topic>Training</topic><topic>User experience</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Shenglin</creatorcontrib><creatorcontrib>Pan, Zhongjie</creatorcontrib><creatorcontrib>Liu, Heng</creatorcontrib><creatorcontrib>Jin, Pengxiang</creatorcontrib><creatorcontrib>Sun, Yongqian</creatorcontrib><creatorcontrib>Ouyang, Qianyu</creatorcontrib><creatorcontrib>Wang, Jiaju</creatorcontrib><creatorcontrib>Jia, Xueying</creatorcontrib><creatorcontrib>Zhang, Yuzhi</creatorcontrib><creatorcontrib>Yang, Hui</creatorcontrib><creatorcontrib>Zou, Yongqiang</creatorcontrib><creatorcontrib>Pei, Dan</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Xplore</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhang, Shenglin</au><au>Pan, Zhongjie</au><au>Liu, Heng</au><au>Jin, Pengxiang</au><au>Sun, Yongqian</au><au>Ouyang, Qianyu</au><au>Wang, Jiaju</au><au>Jia, Xueying</au><au>Zhang, Yuzhi</au><au>Yang, Hui</au><au>Zou, Yongqiang</au><au>Pei, Dan</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Efficient and Robust Trace Anomaly Detection for Large-Scale Microservice Systems</atitle><btitle>2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE)</btitle><stitle>ISSRE</stitle><date>2023-10-09</date><risdate>2023</risdate><spage>69</spage><epage>79</epage><pages>69-79</pages><eissn>2332-6549</eissn><eisbn>9798350315943</eisbn><coden>IEEPAD</coden><abstract>Microservice invocation anomalies can have a detrimental impact on user experience and service revenue. While existing trace anomaly detection approaches typically focus on anomalies in response time and invocation structure, they often overlook the importance of using fine-grained features to detect anomalies. Additionally, trace data obtained from real-world scenarios is typically accompanied by noise, which can hinder the effectiveness of anomaly detection approaches. Furthermore, large-scale trace data can significantly impact model training efficiency. To address these challenges, we propose TraceSieve, an unsupervised trace anomaly detection method that accurately detects trace anomalies. Our approach leverages an auto-encoder architecture within an adversarial training framework to filter out noise data. Additionally, we integrate VGAE-EWC, which combines Variational Graph Auto-Encoder (VGAE) with Elastic Weight Consolidation (EWC), to overcome the challenges of enormous time consumption during the training phase. Finally, we localize the root cause of trace anomalies. Our proposed method is evaluated using two different datasets, and our results demonstrate that TraceSieve achieves an F 1 -score of 0.970 and 0.925, respectively, outperforming state-of-the-art trace anomaly detection approaches.</abstract><pub>IEEE</pub><doi>10.1109/ISSRE59848.2023.00012</doi><tpages>11</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier EISSN: 2332-6549
ispartof 2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE), 2023, p.69-79
issn 2332-6549
language eng
recordid cdi_ieee_primary_10301258
source IEEE Xplore All Conference Series
subjects failure detection
Feature extraction
Location awareness
microservice
Microservice architectures
Noise reduction
Production
trace
Training
User experience
title Efficient and Robust Trace Anomaly Detection for Large-Scale Microservice Systems
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-10T13%3A17%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Efficient%20and%20Robust%20Trace%20Anomaly%20Detection%20for%20Large-Scale%20Microservice%20Systems&rft.btitle=2023%20IEEE%2034th%20International%20Symposium%20on%20Software%20Reliability%20Engineering%20(ISSRE)&rft.au=Zhang,%20Shenglin&rft.date=2023-10-09&rft.spage=69&rft.epage=79&rft.pages=69-79&rft.eissn=2332-6549&rft.coden=IEEPAD&rft_id=info:doi/10.1109/ISSRE59848.2023.00012&rft.eisbn=9798350315943&rft_dat=%3Cieee_CHZPO%3E10301258%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i204t-8521c765a2107fa12ecc504b6fabdafc4cb737bd9084070c96eb38caf997beae3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10301258&rfr_iscdi=true