Loading…

Synthetic flow-based cryptomining attack generation through Generative Adversarial Networks

Due to the growing rise of cyber attacks in the Internet, the demand of accurate intrusion detection systems (IDS) to prevent these vulnerabilities is increasing. To this aim, Machine Learning (ML) components have been proposed as an efficient and effective solution. However, its applicability scope...

Full description

Saved in:
Bibliographic Details
Published in:Scientific reports 2022-02, Vol.12 (1), p.2091-2091, Article 2091
Main Authors: Mozo, Alberto, González-Prieto, Ángel, Pastor, Antonio, Gómez-Canaval, Sandra, Talavera, Edgar
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c540t-330e1206f8eb8c9513da8ea822f18ecf4003457d78213c58f5e2ce3d5d96e92c3
cites cdi_FETCH-LOGICAL-c540t-330e1206f8eb8c9513da8ea822f18ecf4003457d78213c58f5e2ce3d5d96e92c3
container_end_page 2091
container_issue 1
container_start_page 2091
container_title Scientific reports
container_volume 12
creator Mozo, Alberto
González-Prieto, Ángel
Pastor, Antonio
Gómez-Canaval, Sandra
Talavera, Edgar
description Due to the growing rise of cyber attacks in the Internet, the demand of accurate intrusion detection systems (IDS) to prevent these vulnerabilities is increasing. To this aim, Machine Learning (ML) components have been proposed as an efficient and effective solution. However, its applicability scope is limited by two important issues: (i) the shortage of network traffic data datasets for attack analysis, and (ii) the data privacy constraints of the data to be used. To overcome these problems, Generative Adversarial Networks (GANs) have been proposed for synthetic flow-based network traffic generation. However, due to the ill-convergence of the GAN training, none of the existing solutions can generate high-quality fully synthetic data that can totally substitute real data in the training of ML components. In contrast, they mix real with synthetic data, which acts only as data augmentation components, leading to privacy breaches as real data is used. In sharp contrast, in this work we propose a novel and deterministic way to measure the quality of the synthetic data produced by a GAN both with respect to the real data and to its performance when used for ML tasks. As a by-product, we present a heuristic that uses these metrics for selecting the best performing generator during GAN training, leading to a novel stopping criterion, which can be applied even when different types of synthetic data are to be used in the same ML task. We demonstrate the adequacy of our proposal by generating synthetic cryptomining attacks and normal traffic flow-based data using an enhanced version of a Wasserstein GAN. The results evidence that the generated synthetic network traffic can completely replace real data when training a ML-based cryptomining detector, obtaining similar performance and avoiding privacy violations, since real data is not used in the training of the ML-based detector.
doi_str_mv 10.1038/s41598-022-06057-2
format article
fullrecord <record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_cfa22fa15e2a4471b6b069b148161e9c</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><doaj_id>oai_doaj_org_article_cfa22fa15e2a4471b6b069b148161e9c</doaj_id><sourcerecordid>2627134686</sourcerecordid><originalsourceid>FETCH-LOGICAL-c540t-330e1206f8eb8c9513da8ea822f18ecf4003457d78213c58f5e2ce3d5d96e92c3</originalsourceid><addsrcrecordid>eNp9kk1v1DAQhiMEolXpH-CAInHhEmqPP9a5IFVVKZUqOAAnDpbjTLLZZu3Fdrbaf4-32ZaWA77Ymnnn8Yz9FsVbSj5SwtRZ5FTUqiIAFZFELCp4URwD4aICBvDyyfmoOI1xRfISUHNavy6OmKBMUs6Pi1_fdy4tMQ227EZ_VzUmYlvasNskvx7c4PrSpGTsbdmjw2DS4F2ZlsFP_bK8OoS2WJ63WwzRhMGM5VdMdz7cxjfFq86MEU8P-0nx8_Plj4sv1c23q-uL85vKCk5SxRhBCkR2Chtl69xaaxQaBdBRhbbjhDAuFu1CAWVWqE4gWGStaGuJNVh2UlzP3Nabld6EYW3CTnsz6PuAD702IU84oradyVhDM8JwvqCNbIisG8oVlRTrPevTzNpMzRpbiy4FMz6DPs-4Yal7v9VKgVCcZ8CHAyD43xPGpNdDtDiOxqGfogYJC8q4VDJL3_8jXfkpuPxUe5UUkinBsgpmlQ0-xoDdYzOU6L0V9GwFna2g762gIRe9ezrGY8nDx2cBmwUxp1yP4e_d_8H-Af-xv_M</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2626563853</pqid></control><display><type>article</type><title>Synthetic flow-based cryptomining attack generation through Generative Adversarial Networks</title><source>Full-Text Journals in Chemistry (Open access)</source><source>Publicly Available Content (ProQuest)</source><source>PubMed Central</source><source>Springer Nature - nature.com Journals - Fully Open Access</source><creator>Mozo, Alberto ; González-Prieto, Ángel ; Pastor, Antonio ; Gómez-Canaval, Sandra ; Talavera, Edgar</creator><creatorcontrib>Mozo, Alberto ; González-Prieto, Ángel ; Pastor, Antonio ; Gómez-Canaval, Sandra ; Talavera, Edgar</creatorcontrib><description>Due to the growing rise of cyber attacks in the Internet, the demand of accurate intrusion detection systems (IDS) to prevent these vulnerabilities is increasing. To this aim, Machine Learning (ML) components have been proposed as an efficient and effective solution. However, its applicability scope is limited by two important issues: (i) the shortage of network traffic data datasets for attack analysis, and (ii) the data privacy constraints of the data to be used. To overcome these problems, Generative Adversarial Networks (GANs) have been proposed for synthetic flow-based network traffic generation. However, due to the ill-convergence of the GAN training, none of the existing solutions can generate high-quality fully synthetic data that can totally substitute real data in the training of ML components. In contrast, they mix real with synthetic data, which acts only as data augmentation components, leading to privacy breaches as real data is used. In sharp contrast, in this work we propose a novel and deterministic way to measure the quality of the synthetic data produced by a GAN both with respect to the real data and to its performance when used for ML tasks. As a by-product, we present a heuristic that uses these metrics for selecting the best performing generator during GAN training, leading to a novel stopping criterion, which can be applied even when different types of synthetic data are to be used in the same ML task. We demonstrate the adequacy of our proposal by generating synthetic cryptomining attacks and normal traffic flow-based data using an enhanced version of a Wasserstein GAN. The results evidence that the generated synthetic network traffic can completely replace real data when training a ML-based cryptomining detector, obtaining similar performance and avoiding privacy violations, since real data is not used in the training of the ML-based detector.</description><identifier>ISSN: 2045-2322</identifier><identifier>EISSN: 2045-2322</identifier><identifier>DOI: 10.1038/s41598-022-06057-2</identifier><identifier>PMID: 35136144</identifier><language>eng</language><publisher>London: Nature Publishing Group UK</publisher><subject>639/705/1042 ; 639/705/117 ; 639/705/258 ; Humanities and Social Sciences ; Learning algorithms ; Machine learning ; multidisciplinary ; Privacy ; Science ; Science (multidisciplinary)</subject><ispartof>Scientific reports, 2022-02, Vol.12 (1), p.2091-2091, Article 2091</ispartof><rights>The Author(s) 2022</rights><rights>2022. The Author(s).</rights><rights>The Author(s) 2022. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c540t-330e1206f8eb8c9513da8ea822f18ecf4003457d78213c58f5e2ce3d5d96e92c3</citedby><cites>FETCH-LOGICAL-c540t-330e1206f8eb8c9513da8ea822f18ecf4003457d78213c58f5e2ce3d5d96e92c3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/2626563853/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2626563853?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,25752,27923,27924,37011,37012,44589,53790,53792,74897</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/35136144$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Mozo, Alberto</creatorcontrib><creatorcontrib>González-Prieto, Ángel</creatorcontrib><creatorcontrib>Pastor, Antonio</creatorcontrib><creatorcontrib>Gómez-Canaval, Sandra</creatorcontrib><creatorcontrib>Talavera, Edgar</creatorcontrib><title>Synthetic flow-based cryptomining attack generation through Generative Adversarial Networks</title><title>Scientific reports</title><addtitle>Sci Rep</addtitle><addtitle>Sci Rep</addtitle><description>Due to the growing rise of cyber attacks in the Internet, the demand of accurate intrusion detection systems (IDS) to prevent these vulnerabilities is increasing. To this aim, Machine Learning (ML) components have been proposed as an efficient and effective solution. However, its applicability scope is limited by two important issues: (i) the shortage of network traffic data datasets for attack analysis, and (ii) the data privacy constraints of the data to be used. To overcome these problems, Generative Adversarial Networks (GANs) have been proposed for synthetic flow-based network traffic generation. However, due to the ill-convergence of the GAN training, none of the existing solutions can generate high-quality fully synthetic data that can totally substitute real data in the training of ML components. In contrast, they mix real with synthetic data, which acts only as data augmentation components, leading to privacy breaches as real data is used. In sharp contrast, in this work we propose a novel and deterministic way to measure the quality of the synthetic data produced by a GAN both with respect to the real data and to its performance when used for ML tasks. As a by-product, we present a heuristic that uses these metrics for selecting the best performing generator during GAN training, leading to a novel stopping criterion, which can be applied even when different types of synthetic data are to be used in the same ML task. We demonstrate the adequacy of our proposal by generating synthetic cryptomining attacks and normal traffic flow-based data using an enhanced version of a Wasserstein GAN. The results evidence that the generated synthetic network traffic can completely replace real data when training a ML-based cryptomining detector, obtaining similar performance and avoiding privacy violations, since real data is not used in the training of the ML-based detector.</description><subject>639/705/1042</subject><subject>639/705/117</subject><subject>639/705/258</subject><subject>Humanities and Social Sciences</subject><subject>Learning algorithms</subject><subject>Machine learning</subject><subject>multidisciplinary</subject><subject>Privacy</subject><subject>Science</subject><subject>Science (multidisciplinary)</subject><issn>2045-2322</issn><issn>2045-2322</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNp9kk1v1DAQhiMEolXpH-CAInHhEmqPP9a5IFVVKZUqOAAnDpbjTLLZZu3Fdrbaf4-32ZaWA77Ymnnn8Yz9FsVbSj5SwtRZ5FTUqiIAFZFELCp4URwD4aICBvDyyfmoOI1xRfISUHNavy6OmKBMUs6Pi1_fdy4tMQ227EZ_VzUmYlvasNskvx7c4PrSpGTsbdmjw2DS4F2ZlsFP_bK8OoS2WJ63WwzRhMGM5VdMdz7cxjfFq86MEU8P-0nx8_Plj4sv1c23q-uL85vKCk5SxRhBCkR2Chtl69xaaxQaBdBRhbbjhDAuFu1CAWVWqE4gWGStaGuJNVh2UlzP3Nabld6EYW3CTnsz6PuAD702IU84oradyVhDM8JwvqCNbIisG8oVlRTrPevTzNpMzRpbiy4FMz6DPs-4Yal7v9VKgVCcZ8CHAyD43xPGpNdDtDiOxqGfogYJC8q4VDJL3_8jXfkpuPxUe5UUkinBsgpmlQ0-xoDdYzOU6L0V9GwFna2g762gIRe9ezrGY8nDx2cBmwUxp1yP4e_d_8H-Af-xv_M</recordid><startdate>20220208</startdate><enddate>20220208</enddate><creator>Mozo, Alberto</creator><creator>González-Prieto, Ángel</creator><creator>Pastor, Antonio</creator><creator>Gómez-Canaval, Sandra</creator><creator>Talavera, Edgar</creator><general>Nature Publishing Group UK</general><general>Nature Publishing Group</general><general>Nature Portfolio</general><scope>C6C</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7X7</scope><scope>7XB</scope><scope>88A</scope><scope>88E</scope><scope>88I</scope><scope>8FE</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>LK8</scope><scope>M0S</scope><scope>M1P</scope><scope>M2P</scope><scope>M7P</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope></search><sort><creationdate>20220208</creationdate><title>Synthetic flow-based cryptomining attack generation through Generative Adversarial Networks</title><author>Mozo, Alberto ; González-Prieto, Ángel ; Pastor, Antonio ; Gómez-Canaval, Sandra ; Talavera, Edgar</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c540t-330e1206f8eb8c9513da8ea822f18ecf4003457d78213c58f5e2ce3d5d96e92c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>639/705/1042</topic><topic>639/705/117</topic><topic>639/705/258</topic><topic>Humanities and Social Sciences</topic><topic>Learning algorithms</topic><topic>Machine learning</topic><topic>multidisciplinary</topic><topic>Privacy</topic><topic>Science</topic><topic>Science (multidisciplinary)</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Mozo, Alberto</creatorcontrib><creatorcontrib>González-Prieto, Ángel</creatorcontrib><creatorcontrib>Pastor, Antonio</creatorcontrib><creatorcontrib>Gómez-Canaval, Sandra</creatorcontrib><creatorcontrib>Talavera, Edgar</creatorcontrib><collection>SpringerOpen</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>ProQuest Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Biology Database (Alumni Edition)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Science Database (Alumni Edition)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>ProQuest Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Biological Sciences</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>PML(ProQuest Medical Library)</collection><collection>ProQuest Science Journals</collection><collection>Biological Science Database</collection><collection>Publicly Available Content (ProQuest)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>Directory of Open Access Journals</collection><jtitle>Scientific reports</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Mozo, Alberto</au><au>González-Prieto, Ángel</au><au>Pastor, Antonio</au><au>Gómez-Canaval, Sandra</au><au>Talavera, Edgar</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Synthetic flow-based cryptomining attack generation through Generative Adversarial Networks</atitle><jtitle>Scientific reports</jtitle><stitle>Sci Rep</stitle><addtitle>Sci Rep</addtitle><date>2022-02-08</date><risdate>2022</risdate><volume>12</volume><issue>1</issue><spage>2091</spage><epage>2091</epage><pages>2091-2091</pages><artnum>2091</artnum><issn>2045-2322</issn><eissn>2045-2322</eissn><abstract>Due to the growing rise of cyber attacks in the Internet, the demand of accurate intrusion detection systems (IDS) to prevent these vulnerabilities is increasing. To this aim, Machine Learning (ML) components have been proposed as an efficient and effective solution. However, its applicability scope is limited by two important issues: (i) the shortage of network traffic data datasets for attack analysis, and (ii) the data privacy constraints of the data to be used. To overcome these problems, Generative Adversarial Networks (GANs) have been proposed for synthetic flow-based network traffic generation. However, due to the ill-convergence of the GAN training, none of the existing solutions can generate high-quality fully synthetic data that can totally substitute real data in the training of ML components. In contrast, they mix real with synthetic data, which acts only as data augmentation components, leading to privacy breaches as real data is used. In sharp contrast, in this work we propose a novel and deterministic way to measure the quality of the synthetic data produced by a GAN both with respect to the real data and to its performance when used for ML tasks. As a by-product, we present a heuristic that uses these metrics for selecting the best performing generator during GAN training, leading to a novel stopping criterion, which can be applied even when different types of synthetic data are to be used in the same ML task. We demonstrate the adequacy of our proposal by generating synthetic cryptomining attacks and normal traffic flow-based data using an enhanced version of a Wasserstein GAN. The results evidence that the generated synthetic network traffic can completely replace real data when training a ML-based cryptomining detector, obtaining similar performance and avoiding privacy violations, since real data is not used in the training of the ML-based detector.</abstract><cop>London</cop><pub>Nature Publishing Group UK</pub><pmid>35136144</pmid><doi>10.1038/s41598-022-06057-2</doi><tpages>1</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2045-2322
ispartof Scientific reports, 2022-02, Vol.12 (1), p.2091-2091, Article 2091
issn 2045-2322
2045-2322
language eng
recordid cdi_doaj_primary_oai_doaj_org_article_cfa22fa15e2a4471b6b069b148161e9c
source Full-Text Journals in Chemistry (Open access); Publicly Available Content (ProQuest); PubMed Central; Springer Nature - nature.com Journals - Fully Open Access
subjects 639/705/1042
639/705/117
639/705/258
Humanities and Social Sciences
Learning algorithms
Machine learning
multidisciplinary
Privacy
Science
Science (multidisciplinary)
title Synthetic flow-based cryptomining attack generation through Generative Adversarial Networks
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-13T03%3A51%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Synthetic%20flow-based%20cryptomining%20attack%20generation%20through%20Generative%20Adversarial%20Networks&rft.jtitle=Scientific%20reports&rft.au=Mozo,%20Alberto&rft.date=2022-02-08&rft.volume=12&rft.issue=1&rft.spage=2091&rft.epage=2091&rft.pages=2091-2091&rft.artnum=2091&rft.issn=2045-2322&rft.eissn=2045-2322&rft_id=info:doi/10.1038/s41598-022-06057-2&rft_dat=%3Cproquest_doaj_%3E2627134686%3C/proquest_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c540t-330e1206f8eb8c9513da8ea822f18ecf4003457d78213c58f5e2ce3d5d96e92c3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2626563853&rft_id=info:pmid/35136144&rfr_iscdi=true