Loading…
Identifying Root-Cause Changes for User-Reported Incidents in Online Service Systems
In online service systems, a majority of incidents are caused by changes, which can influence user experience and cause huge economic loss. Experiences with a real-world, large-scale online service system show that more than half of the change-induced incidents are reported by users. Identifying roo...
Saved in:
Main Authors: | , , , , , , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | 297 |
container_issue | |
container_start_page | 287 |
container_title | |
container_volume | |
creator | Zhao, Yujin Jiang, Ling Tao, Ye Zhang, Songlin Wu, Changlong Jia, Tong Huang, Xiaosong Li, Ying Wu, Zhonghai |
description | In online service systems, a majority of incidents are caused by changes, which can influence user experience and cause huge economic loss. Experiences with a real-world, large-scale online service system show that more than half of the change-induced incidents are reported by users. Identifying root-cause changes for these incidents is challenging due to the inherent gap between user-perceived functional-level incident information and component-level change details. Inadequate causal knowledge also brings challenges. In this paper, we propose a novel causal knowledge mining based approach aiming at root-cause change identification for user-reported incidents named Raccoon. To bridge the gap between incidents and changes, it utilizes the fault tree and software product line to represent incidents and changes at the user-perceived functional level. They are also used as the backbone of causal knowledge. To overcome the lack of causal knowledge, Raccoon adopts efficient knowledge extraction and inference methods. Moreover, Raccoon provides recommendations at the software product line and change granularity to meet diverse demands of incident triage and root-cause change identification scenarios in incident management. We evaluate Raccoon on a real-world dataset collected in a large-scale online service system. The result shows that Raccoon significantly outperforms the state-of-the-art baseline approaches, which proves its effectiveness. |
doi_str_mv | 10.1109/ISSRE59848.2023.00028 |
format | conference_proceeding |
fullrecord | <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_10301247</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10301247</ieee_id><sourcerecordid>10301247</sourcerecordid><originalsourceid>FETCH-LOGICAL-i204t-d58f0d9f33d0bd127c39cc0fc34834c7ae84cfe0971fefa488aa95f2cf06d87a3</originalsourceid><addsrcrecordid>eNotjc1KAzEURqMgWGvfQCEvMONNbqaTLGWoOlAo9GddYnJTI22mTEZh3t6Krs7mO99h7FFAKQSYp3azWS8qo5UuJUgsAUDqKzYztdFYAYrKKLxmE4koi3mlzC27y_nzsgIl5IRtW09piGGM6cDXXTcUjf3KxJsPmw6Ueeh6vsvUF2s6d_1AnrfJxV8n85j4Kh1jIr6h_ju6C8c80Cnfs5tgj5lm_5yy3cti27wVy9Vr2zwvi3ipD4WvdABvAqKHdy9k7dA4B8Gh0qhcbUkrFwhMLQIFq7S21lRBugBzr2uLU_bw9xuJaH_u48n2414AgpCqxh9ga1Kt</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Identifying Root-Cause Changes for User-Reported Incidents in Online Service Systems</title><source>IEEE Xplore All Conference Series</source><creator>Zhao, Yujin ; Jiang, Ling ; Tao, Ye ; Zhang, Songlin ; Wu, Changlong ; Jia, Tong ; Huang, Xiaosong ; Li, Ying ; Wu, Zhonghai</creator><creatorcontrib>Zhao, Yujin ; Jiang, Ling ; Tao, Ye ; Zhang, Songlin ; Wu, Changlong ; Jia, Tong ; Huang, Xiaosong ; Li, Ying ; Wu, Zhonghai</creatorcontrib><description>In online service systems, a majority of incidents are caused by changes, which can influence user experience and cause huge economic loss. Experiences with a real-world, large-scale online service system show that more than half of the change-induced incidents are reported by users. Identifying root-cause changes for these incidents is challenging due to the inherent gap between user-perceived functional-level incident information and component-level change details. Inadequate causal knowledge also brings challenges. In this paper, we propose a novel causal knowledge mining based approach aiming at root-cause change identification for user-reported incidents named Raccoon. To bridge the gap between incidents and changes, it utilizes the fault tree and software product line to represent incidents and changes at the user-perceived functional level. They are also used as the backbone of causal knowledge. To overcome the lack of causal knowledge, Raccoon adopts efficient knowledge extraction and inference methods. Moreover, Raccoon provides recommendations at the software product line and change granularity to meet diverse demands of incident triage and root-cause change identification scenarios in incident management. We evaluate Raccoon on a real-world dataset collected in a large-scale online service system. The result shows that Raccoon significantly outperforms the state-of-the-art baseline approaches, which proves its effectiveness.</description><identifier>EISSN: 2332-6549</identifier><identifier>EISBN: 9798350315943</identifier><identifier>DOI: 10.1109/ISSRE59848.2023.00028</identifier><identifier>CODEN: IEEPAD</identifier><language>eng</language><publisher>IEEE</publisher><subject>Bridges ; Data mining ; Economics ; knowledge mining ; online service system ; root cause analysis ; Software ; software change ; Software product lines ; Software reliability ; User experience ; user-reported incident</subject><ispartof>2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE), 2023, p.287-297</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10301247$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10301247$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Zhao, Yujin</creatorcontrib><creatorcontrib>Jiang, Ling</creatorcontrib><creatorcontrib>Tao, Ye</creatorcontrib><creatorcontrib>Zhang, Songlin</creatorcontrib><creatorcontrib>Wu, Changlong</creatorcontrib><creatorcontrib>Jia, Tong</creatorcontrib><creatorcontrib>Huang, Xiaosong</creatorcontrib><creatorcontrib>Li, Ying</creatorcontrib><creatorcontrib>Wu, Zhonghai</creatorcontrib><title>Identifying Root-Cause Changes for User-Reported Incidents in Online Service Systems</title><title>2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE)</title><addtitle>ISSRE</addtitle><description>In online service systems, a majority of incidents are caused by changes, which can influence user experience and cause huge economic loss. Experiences with a real-world, large-scale online service system show that more than half of the change-induced incidents are reported by users. Identifying root-cause changes for these incidents is challenging due to the inherent gap between user-perceived functional-level incident information and component-level change details. Inadequate causal knowledge also brings challenges. In this paper, we propose a novel causal knowledge mining based approach aiming at root-cause change identification for user-reported incidents named Raccoon. To bridge the gap between incidents and changes, it utilizes the fault tree and software product line to represent incidents and changes at the user-perceived functional level. They are also used as the backbone of causal knowledge. To overcome the lack of causal knowledge, Raccoon adopts efficient knowledge extraction and inference methods. Moreover, Raccoon provides recommendations at the software product line and change granularity to meet diverse demands of incident triage and root-cause change identification scenarios in incident management. We evaluate Raccoon on a real-world dataset collected in a large-scale online service system. The result shows that Raccoon significantly outperforms the state-of-the-art baseline approaches, which proves its effectiveness.</description><subject>Bridges</subject><subject>Data mining</subject><subject>Economics</subject><subject>knowledge mining</subject><subject>online service system</subject><subject>root cause analysis</subject><subject>Software</subject><subject>software change</subject><subject>Software product lines</subject><subject>Software reliability</subject><subject>User experience</subject><subject>user-reported incident</subject><issn>2332-6549</issn><isbn>9798350315943</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2023</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotjc1KAzEURqMgWGvfQCEvMONNbqaTLGWoOlAo9GddYnJTI22mTEZh3t6Krs7mO99h7FFAKQSYp3azWS8qo5UuJUgsAUDqKzYztdFYAYrKKLxmE4koi3mlzC27y_nzsgIl5IRtW09piGGM6cDXXTcUjf3KxJsPmw6Ueeh6vsvUF2s6d_1AnrfJxV8n85j4Kh1jIr6h_ju6C8c80Cnfs5tgj5lm_5yy3cti27wVy9Vr2zwvi3ipD4WvdABvAqKHdy9k7dA4B8Gh0qhcbUkrFwhMLQIFq7S21lRBugBzr2uLU_bw9xuJaH_u48n2414AgpCqxh9ga1Kt</recordid><startdate>20231009</startdate><enddate>20231009</enddate><creator>Zhao, Yujin</creator><creator>Jiang, Ling</creator><creator>Tao, Ye</creator><creator>Zhang, Songlin</creator><creator>Wu, Changlong</creator><creator>Jia, Tong</creator><creator>Huang, Xiaosong</creator><creator>Li, Ying</creator><creator>Wu, Zhonghai</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>20231009</creationdate><title>Identifying Root-Cause Changes for User-Reported Incidents in Online Service Systems</title><author>Zhao, Yujin ; Jiang, Ling ; Tao, Ye ; Zhang, Songlin ; Wu, Changlong ; Jia, Tong ; Huang, Xiaosong ; Li, Ying ; Wu, Zhonghai</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i204t-d58f0d9f33d0bd127c39cc0fc34834c7ae84cfe0971fefa488aa95f2cf06d87a3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Bridges</topic><topic>Data mining</topic><topic>Economics</topic><topic>knowledge mining</topic><topic>online service system</topic><topic>root cause analysis</topic><topic>Software</topic><topic>software change</topic><topic>Software product lines</topic><topic>Software reliability</topic><topic>User experience</topic><topic>user-reported incident</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhao, Yujin</creatorcontrib><creatorcontrib>Jiang, Ling</creatorcontrib><creatorcontrib>Tao, Ye</creatorcontrib><creatorcontrib>Zhang, Songlin</creatorcontrib><creatorcontrib>Wu, Changlong</creatorcontrib><creatorcontrib>Jia, Tong</creatorcontrib><creatorcontrib>Huang, Xiaosong</creatorcontrib><creatorcontrib>Li, Ying</creatorcontrib><creatorcontrib>Wu, Zhonghai</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library Online</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhao, Yujin</au><au>Jiang, Ling</au><au>Tao, Ye</au><au>Zhang, Songlin</au><au>Wu, Changlong</au><au>Jia, Tong</au><au>Huang, Xiaosong</au><au>Li, Ying</au><au>Wu, Zhonghai</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Identifying Root-Cause Changes for User-Reported Incidents in Online Service Systems</atitle><btitle>2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE)</btitle><stitle>ISSRE</stitle><date>2023-10-09</date><risdate>2023</risdate><spage>287</spage><epage>297</epage><pages>287-297</pages><eissn>2332-6549</eissn><eisbn>9798350315943</eisbn><coden>IEEPAD</coden><abstract>In online service systems, a majority of incidents are caused by changes, which can influence user experience and cause huge economic loss. Experiences with a real-world, large-scale online service system show that more than half of the change-induced incidents are reported by users. Identifying root-cause changes for these incidents is challenging due to the inherent gap between user-perceived functional-level incident information and component-level change details. Inadequate causal knowledge also brings challenges. In this paper, we propose a novel causal knowledge mining based approach aiming at root-cause change identification for user-reported incidents named Raccoon. To bridge the gap between incidents and changes, it utilizes the fault tree and software product line to represent incidents and changes at the user-perceived functional level. They are also used as the backbone of causal knowledge. To overcome the lack of causal knowledge, Raccoon adopts efficient knowledge extraction and inference methods. Moreover, Raccoon provides recommendations at the software product line and change granularity to meet diverse demands of incident triage and root-cause change identification scenarios in incident management. We evaluate Raccoon on a real-world dataset collected in a large-scale online service system. The result shows that Raccoon significantly outperforms the state-of-the-art baseline approaches, which proves its effectiveness.</abstract><pub>IEEE</pub><doi>10.1109/ISSRE59848.2023.00028</doi><tpages>11</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | EISSN: 2332-6549 |
ispartof | 2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE), 2023, p.287-297 |
issn | 2332-6549 |
language | eng |
recordid | cdi_ieee_primary_10301247 |
source | IEEE Xplore All Conference Series |
subjects | Bridges Data mining Economics knowledge mining online service system root cause analysis Software software change Software product lines Software reliability User experience user-reported incident |
title | Identifying Root-Cause Changes for User-Reported Incidents in Online Service Systems |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T03%3A10%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Identifying%20Root-Cause%20Changes%20for%20User-Reported%20Incidents%20in%20Online%20Service%20Systems&rft.btitle=2023%20IEEE%2034th%20International%20Symposium%20on%20Software%20Reliability%20Engineering%20(ISSRE)&rft.au=Zhao,%20Yujin&rft.date=2023-10-09&rft.spage=287&rft.epage=297&rft.pages=287-297&rft.eissn=2332-6549&rft.coden=IEEPAD&rft_id=info:doi/10.1109/ISSRE59848.2023.00028&rft.eisbn=9798350315943&rft_dat=%3Cieee_CHZPO%3E10301247%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i204t-d58f0d9f33d0bd127c39cc0fc34834c7ae84cfe0971fefa488aa95f2cf06d87a3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10301247&rfr_iscdi=true |