Loading…
A Critical Review of Common Log Data Sets Used for Evaluation of Sequence-based Anomaly Detection Techniques
Log data store event execution patterns that correspond to underlying workflows of systems or applications. While most logs are informative, log data also include artifacts that indicate failures or incidents. Accordingly, log data are often used to evaluate anomaly detection techniques that aim to...
Saved in:
Published in: | arXiv.org 2023-09 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | |
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Landauer, Max Skopik, Florian Wurzenberger, Markus |
description | Log data store event execution patterns that correspond to underlying workflows of systems or applications. While most logs are informative, log data also include artifacts that indicate failures or incidents. Accordingly, log data are often used to evaluate anomaly detection techniques that aim to automatically disclose unexpected or otherwise relevant system behavior patterns. Recently, detection approaches leveraging deep learning have increasingly focused on anomalies that manifest as changes of sequential patterns within otherwise normal event traces. Several publicly available data sets, such as HDFS, BGL, Thunderbird, OpenStack, and Hadoop, have since become standards for evaluating these anomaly detection techniques, however, the appropriateness of these data sets has not been closely investigated in the past. In this paper we therefore analyze six publicly available log data sets with focus on the manifestations of anomalies and simple techniques for their detection. Our findings suggest that most anomalies are not directly related to sequential manifestations and that advanced detection techniques are not required to achieve high detection rates on these data sets. |
doi_str_mv | 10.48550/arxiv.2309.02854 |
format | article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2861990144</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2861990144</sourcerecordid><originalsourceid>FETCH-LOGICAL-a954-c06cc28f5effe80c97cf219a003e88e4deca6f247a0c487f638939f03262f0083</originalsourceid><addsrcrecordid>eNotzU1Lw0AUheFBECy1P8DdgOvUmzszycyypPUDCoKN63Kd3tGUJKNJWvXfm6qrs3k4rxBXKcy1NQZuqPuqjnNU4OaA1ugzMUGl0sRqxAsx6_s9AGCWozFqIuqFLLpqqDzV8omPFX_KGGQRmya2ch1f5ZIGkhseevnc806G2MnVkeoDDdUoRrvhjwO3npMXOoFFGxuqv-WSB_a_pmT_1lYj6i_FeaC659n_TkV5uyqL-2T9ePdQLNYJOaMTD5n3aIPhENiCd7kPmDoCUGwt6x17ygLqnMBrm4dMWadcAIUZBgCrpuL67_a9i6fssN3HQ9eOxS3aLHUOUq3VD_QrWcs</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2861990144</pqid></control><display><type>article</type><title>A Critical Review of Common Log Data Sets Used for Evaluation of Sequence-based Anomaly Detection Techniques</title><source>Publicly Available Content Database</source><creator>Landauer, Max ; Skopik, Florian ; Wurzenberger, Markus</creator><creatorcontrib>Landauer, Max ; Skopik, Florian ; Wurzenberger, Markus</creatorcontrib><description>Log data store event execution patterns that correspond to underlying workflows of systems or applications. While most logs are informative, log data also include artifacts that indicate failures or incidents. Accordingly, log data are often used to evaluate anomaly detection techniques that aim to automatically disclose unexpected or otherwise relevant system behavior patterns. Recently, detection approaches leveraging deep learning have increasingly focused on anomalies that manifest as changes of sequential patterns within otherwise normal event traces. Several publicly available data sets, such as HDFS, BGL, Thunderbird, OpenStack, and Hadoop, have since become standards for evaluating these anomaly detection techniques, however, the appropriateness of these data sets has not been closely investigated in the past. In this paper we therefore analyze six publicly available log data sets with focus on the manifestations of anomalies and simple techniques for their detection. Our findings suggest that most anomalies are not directly related to sequential manifestations and that advanced detection techniques are not required to achieve high detection rates on these data sets.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2309.02854</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Anomalies ; Data storage ; Datasets</subject><ispartof>arXiv.org, 2023-09</ispartof><rights>2023. This work is published under http://creativecommons.org/licenses/by-nc-sa/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2861990144?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,27925,37012,44590</link.rule.ids></links><search><creatorcontrib>Landauer, Max</creatorcontrib><creatorcontrib>Skopik, Florian</creatorcontrib><creatorcontrib>Wurzenberger, Markus</creatorcontrib><title>A Critical Review of Common Log Data Sets Used for Evaluation of Sequence-based Anomaly Detection Techniques</title><title>arXiv.org</title><description>Log data store event execution patterns that correspond to underlying workflows of systems or applications. While most logs are informative, log data also include artifacts that indicate failures or incidents. Accordingly, log data are often used to evaluate anomaly detection techniques that aim to automatically disclose unexpected or otherwise relevant system behavior patterns. Recently, detection approaches leveraging deep learning have increasingly focused on anomalies that manifest as changes of sequential patterns within otherwise normal event traces. Several publicly available data sets, such as HDFS, BGL, Thunderbird, OpenStack, and Hadoop, have since become standards for evaluating these anomaly detection techniques, however, the appropriateness of these data sets has not been closely investigated in the past. In this paper we therefore analyze six publicly available log data sets with focus on the manifestations of anomalies and simple techniques for their detection. Our findings suggest that most anomalies are not directly related to sequential manifestations and that advanced detection techniques are not required to achieve high detection rates on these data sets.</description><subject>Anomalies</subject><subject>Data storage</subject><subject>Datasets</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNotzU1Lw0AUheFBECy1P8DdgOvUmzszycyypPUDCoKN63Kd3tGUJKNJWvXfm6qrs3k4rxBXKcy1NQZuqPuqjnNU4OaA1ugzMUGl0sRqxAsx6_s9AGCWozFqIuqFLLpqqDzV8omPFX_KGGQRmya2ch1f5ZIGkhseevnc806G2MnVkeoDDdUoRrvhjwO3npMXOoFFGxuqv-WSB_a_pmT_1lYj6i_FeaC659n_TkV5uyqL-2T9ePdQLNYJOaMTD5n3aIPhENiCd7kPmDoCUGwt6x17ygLqnMBrm4dMWadcAIUZBgCrpuL67_a9i6fssN3HQ9eOxS3aLHUOUq3VD_QrWcs</recordid><startdate>20230906</startdate><enddate>20230906</enddate><creator>Landauer, Max</creator><creator>Skopik, Florian</creator><creator>Wurzenberger, Markus</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20230906</creationdate><title>A Critical Review of Common Log Data Sets Used for Evaluation of Sequence-based Anomaly Detection Techniques</title><author>Landauer, Max ; Skopik, Florian ; Wurzenberger, Markus</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a954-c06cc28f5effe80c97cf219a003e88e4deca6f247a0c487f638939f03262f0083</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Anomalies</topic><topic>Data storage</topic><topic>Datasets</topic><toplevel>online_resources</toplevel><creatorcontrib>Landauer, Max</creatorcontrib><creatorcontrib>Skopik, Florian</creatorcontrib><creatorcontrib>Wurzenberger, Markus</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Landauer, Max</au><au>Skopik, Florian</au><au>Wurzenberger, Markus</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Critical Review of Common Log Data Sets Used for Evaluation of Sequence-based Anomaly Detection Techniques</atitle><jtitle>arXiv.org</jtitle><date>2023-09-06</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>Log data store event execution patterns that correspond to underlying workflows of systems or applications. While most logs are informative, log data also include artifacts that indicate failures or incidents. Accordingly, log data are often used to evaluate anomaly detection techniques that aim to automatically disclose unexpected or otherwise relevant system behavior patterns. Recently, detection approaches leveraging deep learning have increasingly focused on anomalies that manifest as changes of sequential patterns within otherwise normal event traces. Several publicly available data sets, such as HDFS, BGL, Thunderbird, OpenStack, and Hadoop, have since become standards for evaluating these anomaly detection techniques, however, the appropriateness of these data sets has not been closely investigated in the past. In this paper we therefore analyze six publicly available log data sets with focus on the manifestations of anomalies and simple techniques for their detection. Our findings suggest that most anomalies are not directly related to sequential manifestations and that advanced detection techniques are not required to achieve high detection rates on these data sets.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2309.02854</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2023-09 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_2861990144 |
source | Publicly Available Content Database |
subjects | Anomalies Data storage Datasets |
title | A Critical Review of Common Log Data Sets Used for Evaluation of Sequence-based Anomaly Detection Techniques |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T14%3A55%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Critical%20Review%20of%20Common%20Log%20Data%20Sets%20Used%20for%20Evaluation%20of%20Sequence-based%20Anomaly%20Detection%20Techniques&rft.jtitle=arXiv.org&rft.au=Landauer,%20Max&rft.date=2023-09-06&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2309.02854&rft_dat=%3Cproquest%3E2861990144%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-a954-c06cc28f5effe80c97cf219a003e88e4deca6f247a0c487f638939f03262f0083%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2861990144&rft_id=info:pmid/&rfr_iscdi=true |