Loading…

Generating Labeled Training Datasets Towards Unified Network Intrusion Detection Systems

It is crucial to implement innovative artificial intelligence (AI)-powered network intrusion detection systems (NIDSes) to protect enterprise networks from cyberattacks, which have recently become more diverse and sophisticated. High-quality labeled training datasets are required to train AI-powered...

Full description

Saved in:
Bibliographic Details
Published in:IEEE access 2022, Vol.10, p.53972-53986
Main Authors: Ishibashi, Ryosuke, Miyamoto, Kohei, Han, Chansu, Ban, Tao, Takahashi, Takeshi, Takeuchi, Jun'ichi
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c408t-9669bc7a69542b948ebd3acf354df5425c2fae9037069c12a662212016ac6de13
cites cdi_FETCH-LOGICAL-c408t-9669bc7a69542b948ebd3acf354df5425c2fae9037069c12a662212016ac6de13
container_end_page 53986
container_issue
container_start_page 53972
container_title IEEE access
container_volume 10
creator Ishibashi, Ryosuke
Miyamoto, Kohei
Han, Chansu
Ban, Tao
Takahashi, Takeshi
Takeuchi, Jun'ichi
description It is crucial to implement innovative artificial intelligence (AI)-powered network intrusion detection systems (NIDSes) to protect enterprise networks from cyberattacks, which have recently become more diverse and sophisticated. High-quality labeled training datasets are required to train AI-powered NIDSes; such datasets are globally scarce, and generating new training datasets is considered cumbersome. In this study, we investigate the possibility of an approach that integrates the strengths of existing security appliances to generate labeled training datasets that can be leveraged to develop brand-new AI-powered cybersecurity solutions. We begin by locating communication flows that the deployed NIDSes detect as suspicious, investigating their causal factors, and assigning appropriate labels in a universal format. Then, we output the packet data in the identified communication flows and the corresponding alert-type labels as labeled data. We demonstrate the effectiveness of the labeling scheme by evaluating classification models trained with the labeled dataset we generated. Furthermore, we provide case studies to examine the performance of several commonly used NIDSes and on practical approaches to automating the security triage process. Labeled datasets in this study are generated using public datasets and open-source NIDSes to ensure the reproducibility of the results. The datasets and the software tools are made publicly accessible for research use.
doi_str_mv 10.1109/ACCESS.2022.3176098
format article
fullrecord <record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_ca5b571c8e454d2bb3bb7aa5b73b4eb6</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9777676</ieee_id><doaj_id>oai_doaj_org_article_ca5b571c8e454d2bb3bb7aa5b73b4eb6</doaj_id><sourcerecordid>2669159094</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-9669bc7a69542b948ebd3acf354df5425c2fae9037069c12a662212016ac6de13</originalsourceid><addsrcrecordid>eNpNUU1LAzEUXERB0f4CLwueW_Oxm2yOpX4Vih7agrfwkn0rqe1GkxTx35u6RXyX9xhm5g1MUVxTMqGUqNvpbHa_XE4YYWzCqRRENSfFBaNCjXnNxem_-7wYxbgheZoM1fKieH3EHgMk17-VCzC4xbZcBXD9AbiDBBFTLFf-C0Iby3XvOpcZz5i-fHgv530K--h8X95hQpsO1_I7JtzFq-Ksg23E0XFfFuuH-9Xsabx4eZzPpouxrUiTxkoIZayEnKZiRlUNmpaD7XhdtV2Gass6QEW4JEJZykAIxigjVIAVLVJ-WcwH39bDRn8Et4PwrT04_Qv48KYhJGe3qC3UppbUNlhld2YMN0ZCBiU3FRqRvW4Gr4_gP_cYk974fehzfM1yTloroqrM4gPLBh9jwO7vKyX60IgeGtGHRvSxkay6HlQOEf8USkoppOA_3YKHqA</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2669159094</pqid></control><display><type>article</type><title>Generating Labeled Training Datasets Towards Unified Network Intrusion Detection Systems</title><source>IEEE Open Access Journals</source><creator>Ishibashi, Ryosuke ; Miyamoto, Kohei ; Han, Chansu ; Ban, Tao ; Takahashi, Takeshi ; Takeuchi, Jun'ichi</creator><creatorcontrib>Ishibashi, Ryosuke ; Miyamoto, Kohei ; Han, Chansu ; Ban, Tao ; Takahashi, Takeshi ; Takeuchi, Jun'ichi</creatorcontrib><description>It is crucial to implement innovative artificial intelligence (AI)-powered network intrusion detection systems (NIDSes) to protect enterprise networks from cyberattacks, which have recently become more diverse and sophisticated. High-quality labeled training datasets are required to train AI-powered NIDSes; such datasets are globally scarce, and generating new training datasets is considered cumbersome. In this study, we investigate the possibility of an approach that integrates the strengths of existing security appliances to generate labeled training datasets that can be leveraged to develop brand-new AI-powered cybersecurity solutions. We begin by locating communication flows that the deployed NIDSes detect as suspicious, investigating their causal factors, and assigning appropriate labels in a universal format. Then, we output the packet data in the identified communication flows and the corresponding alert-type labels as labeled data. We demonstrate the effectiveness of the labeling scheme by evaluating classification models trained with the labeled dataset we generated. Furthermore, we provide case studies to examine the performance of several commonly used NIDSes and on practical approaches to automating the security triage process. Labeled datasets in this study are generated using public datasets and open-source NIDSes to ensure the reproducibility of the results. The datasets and the software tools are made publicly accessible for research use.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2022.3176098</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Artificial intelligence ; Cybersecurity ; Data models ; Datasets ; Feature extraction ; Intrusion detection systems ; IP networks ; Labels ; Network intrusion detection ; Network intrusion detection system ; packet data analysis ; packet replay ; public dataset ; Reproducibility ; Reproducibility of results ; Security ; security alert ; security data labeling ; Software ; Software development tools ; Training</subject><ispartof>IEEE access, 2022, Vol.10, p.53972-53986</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-9669bc7a69542b948ebd3acf354df5425c2fae9037069c12a662212016ac6de13</citedby><cites>FETCH-LOGICAL-c408t-9669bc7a69542b948ebd3acf354df5425c2fae9037069c12a662212016ac6de13</cites><orcidid>0000-0002-6477-7770 ; 0000-0002-0977-4155 ; 0000-0002-1728-5300 ; 0000-0002-9616-3212 ; 0000-0002-5819-3082</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9777676$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,4024,27633,27923,27924,27925,54933</link.rule.ids></links><search><creatorcontrib>Ishibashi, Ryosuke</creatorcontrib><creatorcontrib>Miyamoto, Kohei</creatorcontrib><creatorcontrib>Han, Chansu</creatorcontrib><creatorcontrib>Ban, Tao</creatorcontrib><creatorcontrib>Takahashi, Takeshi</creatorcontrib><creatorcontrib>Takeuchi, Jun'ichi</creatorcontrib><title>Generating Labeled Training Datasets Towards Unified Network Intrusion Detection Systems</title><title>IEEE access</title><addtitle>Access</addtitle><description>It is crucial to implement innovative artificial intelligence (AI)-powered network intrusion detection systems (NIDSes) to protect enterprise networks from cyberattacks, which have recently become more diverse and sophisticated. High-quality labeled training datasets are required to train AI-powered NIDSes; such datasets are globally scarce, and generating new training datasets is considered cumbersome. In this study, we investigate the possibility of an approach that integrates the strengths of existing security appliances to generate labeled training datasets that can be leveraged to develop brand-new AI-powered cybersecurity solutions. We begin by locating communication flows that the deployed NIDSes detect as suspicious, investigating their causal factors, and assigning appropriate labels in a universal format. Then, we output the packet data in the identified communication flows and the corresponding alert-type labels as labeled data. We demonstrate the effectiveness of the labeling scheme by evaluating classification models trained with the labeled dataset we generated. Furthermore, we provide case studies to examine the performance of several commonly used NIDSes and on practical approaches to automating the security triage process. Labeled datasets in this study are generated using public datasets and open-source NIDSes to ensure the reproducibility of the results. The datasets and the software tools are made publicly accessible for research use.</description><subject>Artificial intelligence</subject><subject>Cybersecurity</subject><subject>Data models</subject><subject>Datasets</subject><subject>Feature extraction</subject><subject>Intrusion detection systems</subject><subject>IP networks</subject><subject>Labels</subject><subject>Network intrusion detection</subject><subject>Network intrusion detection system</subject><subject>packet data analysis</subject><subject>packet replay</subject><subject>public dataset</subject><subject>Reproducibility</subject><subject>Reproducibility of results</subject><subject>Security</subject><subject>security alert</subject><subject>security data labeling</subject><subject>Software</subject><subject>Software development tools</subject><subject>Training</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>DOA</sourceid><recordid>eNpNUU1LAzEUXERB0f4CLwueW_Oxm2yOpX4Vih7agrfwkn0rqe1GkxTx35u6RXyX9xhm5g1MUVxTMqGUqNvpbHa_XE4YYWzCqRRENSfFBaNCjXnNxem_-7wYxbgheZoM1fKieH3EHgMk17-VCzC4xbZcBXD9AbiDBBFTLFf-C0Iby3XvOpcZz5i-fHgv530K--h8X95hQpsO1_I7JtzFq-Ksg23E0XFfFuuH-9Xsabx4eZzPpouxrUiTxkoIZayEnKZiRlUNmpaD7XhdtV2Gass6QEW4JEJZykAIxigjVIAVLVJ-WcwH39bDRn8Et4PwrT04_Qv48KYhJGe3qC3UppbUNlhld2YMN0ZCBiU3FRqRvW4Gr4_gP_cYk974fehzfM1yTloroqrM4gPLBh9jwO7vKyX60IgeGtGHRvSxkay6HlQOEf8USkoppOA_3YKHqA</recordid><startdate>2022</startdate><enddate>2022</enddate><creator>Ishibashi, Ryosuke</creator><creator>Miyamoto, Kohei</creator><creator>Han, Chansu</creator><creator>Ban, Tao</creator><creator>Takahashi, Takeshi</creator><creator>Takeuchi, Jun'ichi</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-6477-7770</orcidid><orcidid>https://orcid.org/0000-0002-0977-4155</orcidid><orcidid>https://orcid.org/0000-0002-1728-5300</orcidid><orcidid>https://orcid.org/0000-0002-9616-3212</orcidid><orcidid>https://orcid.org/0000-0002-5819-3082</orcidid></search><sort><creationdate>2022</creationdate><title>Generating Labeled Training Datasets Towards Unified Network Intrusion Detection Systems</title><author>Ishibashi, Ryosuke ; Miyamoto, Kohei ; Han, Chansu ; Ban, Tao ; Takahashi, Takeshi ; Takeuchi, Jun'ichi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-9669bc7a69542b948ebd3acf354df5425c2fae9037069c12a662212016ac6de13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Artificial intelligence</topic><topic>Cybersecurity</topic><topic>Data models</topic><topic>Datasets</topic><topic>Feature extraction</topic><topic>Intrusion detection systems</topic><topic>IP networks</topic><topic>Labels</topic><topic>Network intrusion detection</topic><topic>Network intrusion detection system</topic><topic>packet data analysis</topic><topic>packet replay</topic><topic>public dataset</topic><topic>Reproducibility</topic><topic>Reproducibility of results</topic><topic>Security</topic><topic>security alert</topic><topic>security data labeling</topic><topic>Software</topic><topic>Software development tools</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ishibashi, Ryosuke</creatorcontrib><creatorcontrib>Miyamoto, Kohei</creatorcontrib><creatorcontrib>Han, Chansu</creatorcontrib><creatorcontrib>Ban, Tao</creatorcontrib><creatorcontrib>Takahashi, Takeshi</creatorcontrib><creatorcontrib>Takeuchi, Jun'ichi</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) Online</collection><collection>IEEE</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ishibashi, Ryosuke</au><au>Miyamoto, Kohei</au><au>Han, Chansu</au><au>Ban, Tao</au><au>Takahashi, Takeshi</au><au>Takeuchi, Jun'ichi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Generating Labeled Training Datasets Towards Unified Network Intrusion Detection Systems</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2022</date><risdate>2022</risdate><volume>10</volume><spage>53972</spage><epage>53986</epage><pages>53972-53986</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>It is crucial to implement innovative artificial intelligence (AI)-powered network intrusion detection systems (NIDSes) to protect enterprise networks from cyberattacks, which have recently become more diverse and sophisticated. High-quality labeled training datasets are required to train AI-powered NIDSes; such datasets are globally scarce, and generating new training datasets is considered cumbersome. In this study, we investigate the possibility of an approach that integrates the strengths of existing security appliances to generate labeled training datasets that can be leveraged to develop brand-new AI-powered cybersecurity solutions. We begin by locating communication flows that the deployed NIDSes detect as suspicious, investigating their causal factors, and assigning appropriate labels in a universal format. Then, we output the packet data in the identified communication flows and the corresponding alert-type labels as labeled data. We demonstrate the effectiveness of the labeling scheme by evaluating classification models trained with the labeled dataset we generated. Furthermore, we provide case studies to examine the performance of several commonly used NIDSes and on practical approaches to automating the security triage process. Labeled datasets in this study are generated using public datasets and open-source NIDSes to ensure the reproducibility of the results. The datasets and the software tools are made publicly accessible for research use.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2022.3176098</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0002-6477-7770</orcidid><orcidid>https://orcid.org/0000-0002-0977-4155</orcidid><orcidid>https://orcid.org/0000-0002-1728-5300</orcidid><orcidid>https://orcid.org/0000-0002-9616-3212</orcidid><orcidid>https://orcid.org/0000-0002-5819-3082</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2022, Vol.10, p.53972-53986
issn 2169-3536
2169-3536
language eng
recordid cdi_doaj_primary_oai_doaj_org_article_ca5b571c8e454d2bb3bb7aa5b73b4eb6
source IEEE Open Access Journals
subjects Artificial intelligence
Cybersecurity
Data models
Datasets
Feature extraction
Intrusion detection systems
IP networks
Labels
Network intrusion detection
Network intrusion detection system
packet data analysis
packet replay
public dataset
Reproducibility
Reproducibility of results
Security
security alert
security data labeling
Software
Software development tools
Training
title Generating Labeled Training Datasets Towards Unified Network Intrusion Detection Systems
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T16%3A32%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Generating%20Labeled%20Training%20Datasets%20Towards%20Unified%20Network%20Intrusion%20Detection%20Systems&rft.jtitle=IEEE%20access&rft.au=Ishibashi,%20Ryosuke&rft.date=2022&rft.volume=10&rft.spage=53972&rft.epage=53986&rft.pages=53972-53986&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2022.3176098&rft_dat=%3Cproquest_doaj_%3E2669159094%3C/proquest_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c408t-9669bc7a69542b948ebd3acf354df5425c2fae9037069c12a662212016ac6de13%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2669159094&rft_id=info:pmid/&rft_ieee_id=9777676&rfr_iscdi=true