Loading…

Identifying Root-Cause Changes for User-Reported Incidents in Online Service Systems

In online service systems, a majority of incidents are caused by changes, which can influence user experience and cause huge economic loss. Experiences with a real-world, large-scale online service system show that more than half of the change-induced incidents are reported by users. Identifying roo...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhao, Yujin, Jiang, Ling, Tao, Ye, Zhang, Songlin, Wu, Changlong, Jia, Tong, Huang, Xiaosong, Li, Ying, Wu, Zhonghai
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 297
container_issue
container_start_page 287
container_title
container_volume
creator Zhao, Yujin
Jiang, Ling
Tao, Ye
Zhang, Songlin
Wu, Changlong
Jia, Tong
Huang, Xiaosong
Li, Ying
Wu, Zhonghai
description In online service systems, a majority of incidents are caused by changes, which can influence user experience and cause huge economic loss. Experiences with a real-world, large-scale online service system show that more than half of the change-induced incidents are reported by users. Identifying root-cause changes for these incidents is challenging due to the inherent gap between user-perceived functional-level incident information and component-level change details. Inadequate causal knowledge also brings challenges. In this paper, we propose a novel causal knowledge mining based approach aiming at root-cause change identification for user-reported incidents named Raccoon. To bridge the gap between incidents and changes, it utilizes the fault tree and software product line to represent incidents and changes at the user-perceived functional level. They are also used as the backbone of causal knowledge. To overcome the lack of causal knowledge, Raccoon adopts efficient knowledge extraction and inference methods. Moreover, Raccoon provides recommendations at the software product line and change granularity to meet diverse demands of incident triage and root-cause change identification scenarios in incident management. We evaluate Raccoon on a real-world dataset collected in a large-scale online service system. The result shows that Raccoon significantly outperforms the state-of-the-art baseline approaches, which proves its effectiveness.
doi_str_mv 10.1109/ISSRE59848.2023.00028
format conference_proceeding
fullrecord <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_10301247</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10301247</ieee_id><sourcerecordid>10301247</sourcerecordid><originalsourceid>FETCH-LOGICAL-i204t-d58f0d9f33d0bd127c39cc0fc34834c7ae84cfe0971fefa488aa95f2cf06d87a3</originalsourceid><addsrcrecordid>eNotjc1KAzEURqMgWGvfQCEvMONNbqaTLGWoOlAo9GddYnJTI22mTEZh3t6Krs7mO99h7FFAKQSYp3azWS8qo5UuJUgsAUDqKzYztdFYAYrKKLxmE4koi3mlzC27y_nzsgIl5IRtW09piGGM6cDXXTcUjf3KxJsPmw6Ueeh6vsvUF2s6d_1AnrfJxV8n85j4Kh1jIr6h_ju6C8c80Cnfs5tgj5lm_5yy3cti27wVy9Vr2zwvi3ipD4WvdABvAqKHdy9k7dA4B8Gh0qhcbUkrFwhMLQIFq7S21lRBugBzr2uLU_bw9xuJaH_u48n2414AgpCqxh9ga1Kt</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Identifying Root-Cause Changes for User-Reported Incidents in Online Service Systems</title><source>IEEE Xplore All Conference Series</source><creator>Zhao, Yujin ; Jiang, Ling ; Tao, Ye ; Zhang, Songlin ; Wu, Changlong ; Jia, Tong ; Huang, Xiaosong ; Li, Ying ; Wu, Zhonghai</creator><creatorcontrib>Zhao, Yujin ; Jiang, Ling ; Tao, Ye ; Zhang, Songlin ; Wu, Changlong ; Jia, Tong ; Huang, Xiaosong ; Li, Ying ; Wu, Zhonghai</creatorcontrib><description>In online service systems, a majority of incidents are caused by changes, which can influence user experience and cause huge economic loss. Experiences with a real-world, large-scale online service system show that more than half of the change-induced incidents are reported by users. Identifying root-cause changes for these incidents is challenging due to the inherent gap between user-perceived functional-level incident information and component-level change details. Inadequate causal knowledge also brings challenges. In this paper, we propose a novel causal knowledge mining based approach aiming at root-cause change identification for user-reported incidents named Raccoon. To bridge the gap between incidents and changes, it utilizes the fault tree and software product line to represent incidents and changes at the user-perceived functional level. They are also used as the backbone of causal knowledge. To overcome the lack of causal knowledge, Raccoon adopts efficient knowledge extraction and inference methods. Moreover, Raccoon provides recommendations at the software product line and change granularity to meet diverse demands of incident triage and root-cause change identification scenarios in incident management. We evaluate Raccoon on a real-world dataset collected in a large-scale online service system. The result shows that Raccoon significantly outperforms the state-of-the-art baseline approaches, which proves its effectiveness.</description><identifier>EISSN: 2332-6549</identifier><identifier>EISBN: 9798350315943</identifier><identifier>DOI: 10.1109/ISSRE59848.2023.00028</identifier><identifier>CODEN: IEEPAD</identifier><language>eng</language><publisher>IEEE</publisher><subject>Bridges ; Data mining ; Economics ; knowledge mining ; online service system ; root cause analysis ; Software ; software change ; Software product lines ; Software reliability ; User experience ; user-reported incident</subject><ispartof>2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE), 2023, p.287-297</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10301247$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10301247$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Zhao, Yujin</creatorcontrib><creatorcontrib>Jiang, Ling</creatorcontrib><creatorcontrib>Tao, Ye</creatorcontrib><creatorcontrib>Zhang, Songlin</creatorcontrib><creatorcontrib>Wu, Changlong</creatorcontrib><creatorcontrib>Jia, Tong</creatorcontrib><creatorcontrib>Huang, Xiaosong</creatorcontrib><creatorcontrib>Li, Ying</creatorcontrib><creatorcontrib>Wu, Zhonghai</creatorcontrib><title>Identifying Root-Cause Changes for User-Reported Incidents in Online Service Systems</title><title>2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE)</title><addtitle>ISSRE</addtitle><description>In online service systems, a majority of incidents are caused by changes, which can influence user experience and cause huge economic loss. Experiences with a real-world, large-scale online service system show that more than half of the change-induced incidents are reported by users. Identifying root-cause changes for these incidents is challenging due to the inherent gap between user-perceived functional-level incident information and component-level change details. Inadequate causal knowledge also brings challenges. In this paper, we propose a novel causal knowledge mining based approach aiming at root-cause change identification for user-reported incidents named Raccoon. To bridge the gap between incidents and changes, it utilizes the fault tree and software product line to represent incidents and changes at the user-perceived functional level. They are also used as the backbone of causal knowledge. To overcome the lack of causal knowledge, Raccoon adopts efficient knowledge extraction and inference methods. Moreover, Raccoon provides recommendations at the software product line and change granularity to meet diverse demands of incident triage and root-cause change identification scenarios in incident management. We evaluate Raccoon on a real-world dataset collected in a large-scale online service system. The result shows that Raccoon significantly outperforms the state-of-the-art baseline approaches, which proves its effectiveness.</description><subject>Bridges</subject><subject>Data mining</subject><subject>Economics</subject><subject>knowledge mining</subject><subject>online service system</subject><subject>root cause analysis</subject><subject>Software</subject><subject>software change</subject><subject>Software product lines</subject><subject>Software reliability</subject><subject>User experience</subject><subject>user-reported incident</subject><issn>2332-6549</issn><isbn>9798350315943</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2023</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotjc1KAzEURqMgWGvfQCEvMONNbqaTLGWoOlAo9GddYnJTI22mTEZh3t6Krs7mO99h7FFAKQSYp3azWS8qo5UuJUgsAUDqKzYztdFYAYrKKLxmE4koi3mlzC27y_nzsgIl5IRtW09piGGM6cDXXTcUjf3KxJsPmw6Ueeh6vsvUF2s6d_1AnrfJxV8n85j4Kh1jIr6h_ju6C8c80Cnfs5tgj5lm_5yy3cti27wVy9Vr2zwvi3ipD4WvdABvAqKHdy9k7dA4B8Gh0qhcbUkrFwhMLQIFq7S21lRBugBzr2uLU_bw9xuJaH_u48n2414AgpCqxh9ga1Kt</recordid><startdate>20231009</startdate><enddate>20231009</enddate><creator>Zhao, Yujin</creator><creator>Jiang, Ling</creator><creator>Tao, Ye</creator><creator>Zhang, Songlin</creator><creator>Wu, Changlong</creator><creator>Jia, Tong</creator><creator>Huang, Xiaosong</creator><creator>Li, Ying</creator><creator>Wu, Zhonghai</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>20231009</creationdate><title>Identifying Root-Cause Changes for User-Reported Incidents in Online Service Systems</title><author>Zhao, Yujin ; Jiang, Ling ; Tao, Ye ; Zhang, Songlin ; Wu, Changlong ; Jia, Tong ; Huang, Xiaosong ; Li, Ying ; Wu, Zhonghai</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i204t-d58f0d9f33d0bd127c39cc0fc34834c7ae84cfe0971fefa488aa95f2cf06d87a3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Bridges</topic><topic>Data mining</topic><topic>Economics</topic><topic>knowledge mining</topic><topic>online service system</topic><topic>root cause analysis</topic><topic>Software</topic><topic>software change</topic><topic>Software product lines</topic><topic>Software reliability</topic><topic>User experience</topic><topic>user-reported incident</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhao, Yujin</creatorcontrib><creatorcontrib>Jiang, Ling</creatorcontrib><creatorcontrib>Tao, Ye</creatorcontrib><creatorcontrib>Zhang, Songlin</creatorcontrib><creatorcontrib>Wu, Changlong</creatorcontrib><creatorcontrib>Jia, Tong</creatorcontrib><creatorcontrib>Huang, Xiaosong</creatorcontrib><creatorcontrib>Li, Ying</creatorcontrib><creatorcontrib>Wu, Zhonghai</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library Online</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhao, Yujin</au><au>Jiang, Ling</au><au>Tao, Ye</au><au>Zhang, Songlin</au><au>Wu, Changlong</au><au>Jia, Tong</au><au>Huang, Xiaosong</au><au>Li, Ying</au><au>Wu, Zhonghai</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Identifying Root-Cause Changes for User-Reported Incidents in Online Service Systems</atitle><btitle>2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE)</btitle><stitle>ISSRE</stitle><date>2023-10-09</date><risdate>2023</risdate><spage>287</spage><epage>297</epage><pages>287-297</pages><eissn>2332-6549</eissn><eisbn>9798350315943</eisbn><coden>IEEPAD</coden><abstract>In online service systems, a majority of incidents are caused by changes, which can influence user experience and cause huge economic loss. Experiences with a real-world, large-scale online service system show that more than half of the change-induced incidents are reported by users. Identifying root-cause changes for these incidents is challenging due to the inherent gap between user-perceived functional-level incident information and component-level change details. Inadequate causal knowledge also brings challenges. In this paper, we propose a novel causal knowledge mining based approach aiming at root-cause change identification for user-reported incidents named Raccoon. To bridge the gap between incidents and changes, it utilizes the fault tree and software product line to represent incidents and changes at the user-perceived functional level. They are also used as the backbone of causal knowledge. To overcome the lack of causal knowledge, Raccoon adopts efficient knowledge extraction and inference methods. Moreover, Raccoon provides recommendations at the software product line and change granularity to meet diverse demands of incident triage and root-cause change identification scenarios in incident management. We evaluate Raccoon on a real-world dataset collected in a large-scale online service system. The result shows that Raccoon significantly outperforms the state-of-the-art baseline approaches, which proves its effectiveness.</abstract><pub>IEEE</pub><doi>10.1109/ISSRE59848.2023.00028</doi><tpages>11</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier EISSN: 2332-6549
ispartof 2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE), 2023, p.287-297
issn 2332-6549
language eng
recordid cdi_ieee_primary_10301247
source IEEE Xplore All Conference Series
subjects Bridges
Data mining
Economics
knowledge mining
online service system
root cause analysis
Software
software change
Software product lines
Software reliability
User experience
user-reported incident
title Identifying Root-Cause Changes for User-Reported Incidents in Online Service Systems
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T03%3A10%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Identifying%20Root-Cause%20Changes%20for%20User-Reported%20Incidents%20in%20Online%20Service%20Systems&rft.btitle=2023%20IEEE%2034th%20International%20Symposium%20on%20Software%20Reliability%20Engineering%20(ISSRE)&rft.au=Zhao,%20Yujin&rft.date=2023-10-09&rft.spage=287&rft.epage=297&rft.pages=287-297&rft.eissn=2332-6549&rft.coden=IEEPAD&rft_id=info:doi/10.1109/ISSRE59848.2023.00028&rft.eisbn=9798350315943&rft_dat=%3Cieee_CHZPO%3E10301247%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i204t-d58f0d9f33d0bd127c39cc0fc34834c7ae84cfe0971fefa488aa95f2cf06d87a3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10301247&rfr_iscdi=true