Loading…
Efficient and Robust Trace Anomaly Detection for Large-Scale Microservice Systems
Microservice invocation anomalies can have a detrimental impact on user experience and service revenue. While existing trace anomaly detection approaches typically focus on anomalies in response time and invocation structure, they often overlook the importance of using fine-grained features to detec...
Saved in:
Main Authors: | , , , , , , , , , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | 79 |
container_issue | |
container_start_page | 69 |
container_title | |
container_volume | |
creator | Zhang, Shenglin Pan, Zhongjie Liu, Heng Jin, Pengxiang Sun, Yongqian Ouyang, Qianyu Wang, Jiaju Jia, Xueying Zhang, Yuzhi Yang, Hui Zou, Yongqiang Pei, Dan |
description | Microservice invocation anomalies can have a detrimental impact on user experience and service revenue. While existing trace anomaly detection approaches typically focus on anomalies in response time and invocation structure, they often overlook the importance of using fine-grained features to detect anomalies. Additionally, trace data obtained from real-world scenarios is typically accompanied by noise, which can hinder the effectiveness of anomaly detection approaches. Furthermore, large-scale trace data can significantly impact model training efficiency. To address these challenges, we propose TraceSieve, an unsupervised trace anomaly detection method that accurately detects trace anomalies. Our approach leverages an auto-encoder architecture within an adversarial training framework to filter out noise data. Additionally, we integrate VGAE-EWC, which combines Variational Graph Auto-Encoder (VGAE) with Elastic Weight Consolidation (EWC), to overcome the challenges of enormous time consumption during the training phase. Finally, we localize the root cause of trace anomalies. Our proposed method is evaluated using two different datasets, and our results demonstrate that TraceSieve achieves an F 1 -score of 0.970 and 0.925, respectively, outperforming state-of-the-art trace anomaly detection approaches. |
doi_str_mv | 10.1109/ISSRE59848.2023.00012 |
format | conference_proceeding |
fullrecord | <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_10301258</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10301258</ieee_id><sourcerecordid>10301258</sourcerecordid><originalsourceid>FETCH-LOGICAL-i204t-8521c765a2107fa12ecc504b6fabdafc4cb737bd9084070c96eb38caf997beae3</originalsourceid><addsrcrecordid>eNotkM1KAzEURqMgWGvfQCEvMPXmb5IsSx1roSJ26rokmRuJtDOSjELf3oKuvs3hHPgIuWcwZwzsw7ptt42yRpo5By7mAMD4BZlZbY1QIJiyUlySCReCV7WS9prclPIJwEEyPiFvTYwpJOxH6vqObgf_XUa6yy4gXfTD0R1O9BFHDGMaehqHTDcuf2DVBndA-pJCHgrmn3TG21MZ8VhuyVV0h4Kz_52S96dmt3yuNq-r9XKxqdI5PVZGcRZ0rRxnoKNjHENQIH0dne9cDDJ4LbTvLBgJGoKt0QsTXLRWe3QopuTuz5sQcf-V09Hl056BOB-gjPgFDiFRYw</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Efficient and Robust Trace Anomaly Detection for Large-Scale Microservice Systems</title><source>IEEE Xplore All Conference Series</source><creator>Zhang, Shenglin ; Pan, Zhongjie ; Liu, Heng ; Jin, Pengxiang ; Sun, Yongqian ; Ouyang, Qianyu ; Wang, Jiaju ; Jia, Xueying ; Zhang, Yuzhi ; Yang, Hui ; Zou, Yongqiang ; Pei, Dan</creator><creatorcontrib>Zhang, Shenglin ; Pan, Zhongjie ; Liu, Heng ; Jin, Pengxiang ; Sun, Yongqian ; Ouyang, Qianyu ; Wang, Jiaju ; Jia, Xueying ; Zhang, Yuzhi ; Yang, Hui ; Zou, Yongqiang ; Pei, Dan</creatorcontrib><description>Microservice invocation anomalies can have a detrimental impact on user experience and service revenue. While existing trace anomaly detection approaches typically focus on anomalies in response time and invocation structure, they often overlook the importance of using fine-grained features to detect anomalies. Additionally, trace data obtained from real-world scenarios is typically accompanied by noise, which can hinder the effectiveness of anomaly detection approaches. Furthermore, large-scale trace data can significantly impact model training efficiency. To address these challenges, we propose TraceSieve, an unsupervised trace anomaly detection method that accurately detects trace anomalies. Our approach leverages an auto-encoder architecture within an adversarial training framework to filter out noise data. Additionally, we integrate VGAE-EWC, which combines Variational Graph Auto-Encoder (VGAE) with Elastic Weight Consolidation (EWC), to overcome the challenges of enormous time consumption during the training phase. Finally, we localize the root cause of trace anomalies. Our proposed method is evaluated using two different datasets, and our results demonstrate that TraceSieve achieves an F 1 -score of 0.970 and 0.925, respectively, outperforming state-of-the-art trace anomaly detection approaches.</description><identifier>EISSN: 2332-6549</identifier><identifier>EISBN: 9798350315943</identifier><identifier>DOI: 10.1109/ISSRE59848.2023.00012</identifier><identifier>CODEN: IEEPAD</identifier><language>eng</language><publisher>IEEE</publisher><subject>failure detection ; Feature extraction ; Location awareness ; microservice ; Microservice architectures ; Noise reduction ; Production ; trace ; Training ; User experience</subject><ispartof>2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE), 2023, p.69-79</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10301258$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,27924,54554,54931</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10301258$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Zhang, Shenglin</creatorcontrib><creatorcontrib>Pan, Zhongjie</creatorcontrib><creatorcontrib>Liu, Heng</creatorcontrib><creatorcontrib>Jin, Pengxiang</creatorcontrib><creatorcontrib>Sun, Yongqian</creatorcontrib><creatorcontrib>Ouyang, Qianyu</creatorcontrib><creatorcontrib>Wang, Jiaju</creatorcontrib><creatorcontrib>Jia, Xueying</creatorcontrib><creatorcontrib>Zhang, Yuzhi</creatorcontrib><creatorcontrib>Yang, Hui</creatorcontrib><creatorcontrib>Zou, Yongqiang</creatorcontrib><creatorcontrib>Pei, Dan</creatorcontrib><title>Efficient and Robust Trace Anomaly Detection for Large-Scale Microservice Systems</title><title>2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE)</title><addtitle>ISSRE</addtitle><description>Microservice invocation anomalies can have a detrimental impact on user experience and service revenue. While existing trace anomaly detection approaches typically focus on anomalies in response time and invocation structure, they often overlook the importance of using fine-grained features to detect anomalies. Additionally, trace data obtained from real-world scenarios is typically accompanied by noise, which can hinder the effectiveness of anomaly detection approaches. Furthermore, large-scale trace data can significantly impact model training efficiency. To address these challenges, we propose TraceSieve, an unsupervised trace anomaly detection method that accurately detects trace anomalies. Our approach leverages an auto-encoder architecture within an adversarial training framework to filter out noise data. Additionally, we integrate VGAE-EWC, which combines Variational Graph Auto-Encoder (VGAE) with Elastic Weight Consolidation (EWC), to overcome the challenges of enormous time consumption during the training phase. Finally, we localize the root cause of trace anomalies. Our proposed method is evaluated using two different datasets, and our results demonstrate that TraceSieve achieves an F 1 -score of 0.970 and 0.925, respectively, outperforming state-of-the-art trace anomaly detection approaches.</description><subject>failure detection</subject><subject>Feature extraction</subject><subject>Location awareness</subject><subject>microservice</subject><subject>Microservice architectures</subject><subject>Noise reduction</subject><subject>Production</subject><subject>trace</subject><subject>Training</subject><subject>User experience</subject><issn>2332-6549</issn><isbn>9798350315943</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2023</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotkM1KAzEURqMgWGvfQCEvMPXmb5IsSx1roSJ26rokmRuJtDOSjELf3oKuvs3hHPgIuWcwZwzsw7ptt42yRpo5By7mAMD4BZlZbY1QIJiyUlySCReCV7WS9prclPIJwEEyPiFvTYwpJOxH6vqObgf_XUa6yy4gXfTD0R1O9BFHDGMaehqHTDcuf2DVBndA-pJCHgrmn3TG21MZ8VhuyVV0h4Kz_52S96dmt3yuNq-r9XKxqdI5PVZGcRZ0rRxnoKNjHENQIH0dne9cDDJ4LbTvLBgJGoKt0QsTXLRWe3QopuTuz5sQcf-V09Hl056BOB-gjPgFDiFRYw</recordid><startdate>20231009</startdate><enddate>20231009</enddate><creator>Zhang, Shenglin</creator><creator>Pan, Zhongjie</creator><creator>Liu, Heng</creator><creator>Jin, Pengxiang</creator><creator>Sun, Yongqian</creator><creator>Ouyang, Qianyu</creator><creator>Wang, Jiaju</creator><creator>Jia, Xueying</creator><creator>Zhang, Yuzhi</creator><creator>Yang, Hui</creator><creator>Zou, Yongqiang</creator><creator>Pei, Dan</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>20231009</creationdate><title>Efficient and Robust Trace Anomaly Detection for Large-Scale Microservice Systems</title><author>Zhang, Shenglin ; Pan, Zhongjie ; Liu, Heng ; Jin, Pengxiang ; Sun, Yongqian ; Ouyang, Qianyu ; Wang, Jiaju ; Jia, Xueying ; Zhang, Yuzhi ; Yang, Hui ; Zou, Yongqiang ; Pei, Dan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i204t-8521c765a2107fa12ecc504b6fabdafc4cb737bd9084070c96eb38caf997beae3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2023</creationdate><topic>failure detection</topic><topic>Feature extraction</topic><topic>Location awareness</topic><topic>microservice</topic><topic>Microservice architectures</topic><topic>Noise reduction</topic><topic>Production</topic><topic>trace</topic><topic>Training</topic><topic>User experience</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Shenglin</creatorcontrib><creatorcontrib>Pan, Zhongjie</creatorcontrib><creatorcontrib>Liu, Heng</creatorcontrib><creatorcontrib>Jin, Pengxiang</creatorcontrib><creatorcontrib>Sun, Yongqian</creatorcontrib><creatorcontrib>Ouyang, Qianyu</creatorcontrib><creatorcontrib>Wang, Jiaju</creatorcontrib><creatorcontrib>Jia, Xueying</creatorcontrib><creatorcontrib>Zhang, Yuzhi</creatorcontrib><creatorcontrib>Yang, Hui</creatorcontrib><creatorcontrib>Zou, Yongqiang</creatorcontrib><creatorcontrib>Pei, Dan</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Xplore</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhang, Shenglin</au><au>Pan, Zhongjie</au><au>Liu, Heng</au><au>Jin, Pengxiang</au><au>Sun, Yongqian</au><au>Ouyang, Qianyu</au><au>Wang, Jiaju</au><au>Jia, Xueying</au><au>Zhang, Yuzhi</au><au>Yang, Hui</au><au>Zou, Yongqiang</au><au>Pei, Dan</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Efficient and Robust Trace Anomaly Detection for Large-Scale Microservice Systems</atitle><btitle>2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE)</btitle><stitle>ISSRE</stitle><date>2023-10-09</date><risdate>2023</risdate><spage>69</spage><epage>79</epage><pages>69-79</pages><eissn>2332-6549</eissn><eisbn>9798350315943</eisbn><coden>IEEPAD</coden><abstract>Microservice invocation anomalies can have a detrimental impact on user experience and service revenue. While existing trace anomaly detection approaches typically focus on anomalies in response time and invocation structure, they often overlook the importance of using fine-grained features to detect anomalies. Additionally, trace data obtained from real-world scenarios is typically accompanied by noise, which can hinder the effectiveness of anomaly detection approaches. Furthermore, large-scale trace data can significantly impact model training efficiency. To address these challenges, we propose TraceSieve, an unsupervised trace anomaly detection method that accurately detects trace anomalies. Our approach leverages an auto-encoder architecture within an adversarial training framework to filter out noise data. Additionally, we integrate VGAE-EWC, which combines Variational Graph Auto-Encoder (VGAE) with Elastic Weight Consolidation (EWC), to overcome the challenges of enormous time consumption during the training phase. Finally, we localize the root cause of trace anomalies. Our proposed method is evaluated using two different datasets, and our results demonstrate that TraceSieve achieves an F 1 -score of 0.970 and 0.925, respectively, outperforming state-of-the-art trace anomaly detection approaches.</abstract><pub>IEEE</pub><doi>10.1109/ISSRE59848.2023.00012</doi><tpages>11</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | EISSN: 2332-6549 |
ispartof | 2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE), 2023, p.69-79 |
issn | 2332-6549 |
language | eng |
recordid | cdi_ieee_primary_10301258 |
source | IEEE Xplore All Conference Series |
subjects | failure detection Feature extraction Location awareness microservice Microservice architectures Noise reduction Production trace Training User experience |
title | Efficient and Robust Trace Anomaly Detection for Large-Scale Microservice Systems |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-10T13%3A17%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Efficient%20and%20Robust%20Trace%20Anomaly%20Detection%20for%20Large-Scale%20Microservice%20Systems&rft.btitle=2023%20IEEE%2034th%20International%20Symposium%20on%20Software%20Reliability%20Engineering%20(ISSRE)&rft.au=Zhang,%20Shenglin&rft.date=2023-10-09&rft.spage=69&rft.epage=79&rft.pages=69-79&rft.eissn=2332-6549&rft.coden=IEEPAD&rft_id=info:doi/10.1109/ISSRE59848.2023.00012&rft.eisbn=9798350315943&rft_dat=%3Cieee_CHZPO%3E10301258%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i204t-8521c765a2107fa12ecc504b6fabdafc4cb737bd9084070c96eb38caf997beae3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10301258&rfr_iscdi=true |