Loading…

Elephant Flow Detection With Random Forest Models Under Programmable Network Dataplane Constraints

This paper investigates the application of tree-based machine learning classifiers for flow-based traffic engineering, focusing on the binary classification of IP network flows into mice (short flows) and elephants (long flows) using 5-tuple header fields from the first packet. Unlike prior studies...

Full description

Saved in:
Bibliographic Details
Published in:IEEE access 2024, Vol.12, p.158561-158578
Main Authors: Jurkiewicz, Piotr, Kadziolka, Bartosz, Kantor, Miroslaw, Wojcik, Robert, Domzal, Jerzy
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c244t-18bd134d12dcbff486cb72f35479a0bc7f00f30b5bafa1cd4b7c59672771ea4e3
container_end_page 158578
container_issue
container_start_page 158561
container_title IEEE access
container_volume 12
creator Jurkiewicz, Piotr
Kadziolka, Bartosz
Kantor, Miroslaw
Wojcik, Robert
Domzal, Jerzy
description This paper investigates the application of tree-based machine learning classifiers for flow-based traffic engineering, focusing on the binary classification of IP network flows into mice (short flows) and elephants (long flows) using 5-tuple header fields from the first packet. Unlike prior studies on network flow classification, our analysis uses performance metrics normalized by traffic coverage, ensuring relevance for traffic engineering and QoS applications. We also evaluate models within the constraints of programmable switching hardware, such as the Intel Tofino P4 chip. Our findings show that such constrained models can achieve high accuracy while performing inference at line rate in the dataplane. Additionally, we reveal a trade-off between tree depth and input format, with bit transformations enabling more efficient feature extraction at lower depths. Our results show that optimal tree depths range from 15 to 25 levels, depending on the input format. The most effective model employs extremely randomized trees with bit-transformed input and trees of depth 20.
doi_str_mv 10.1109/ACCESS.2024.3485588
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_ACCESS_2024_3485588</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10731663</ieee_id><doaj_id>oai_doaj_org_article_b164ec6d6d3b41c3b763848a13e0c07c</doaj_id><sourcerecordid>3123122420</sourcerecordid><originalsourceid>FETCH-LOGICAL-c244t-18bd134d12dcbff486cb72f35479a0bc7f00f30b5bafa1cd4b7c59672771ea4e3</originalsourceid><addsrcrecordid>eNpNUdtKxDAQLaKgqF-gDwGfd82tSfdR6q4K3vCCj2GSTLVrt1mTiPj3ViviMDDDYc6ZGU5RHDA6ZYzOjk_qen5_P-WUy6mQVVlW1Uaxw5maTUQp1Oa_frvYT2lJh6gGqNQ7hZ13uH6BPpNFFz7IKWZ0uQ09eWrzC7mD3ocVWYSIKZOr4LFL5LH3GMltDM8RViuwHZJrzB8hvpJTyLDuoEdShz7lCG2f016x1UCXcP-37haPi_lDfT65vDm7qE8uJ45LmSessp4J6Rn3zjaNrJSzmjeilHoG1DrdUNoIaksLDTDnpdWunCnNtWYIEsVucTHq-gBLs47tCuKnCdCaHyDEZwMxt65DY5mS6JRXXljJnLBaiUpWwARSR7UbtI5GrXUMb-_D82YZ3mM_nG8E40NyyekwJcYpF0NKEZu_rYyab2_M6I359sb8ejOwDkdWi4j_GFowpYT4ArA9i10</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3123122420</pqid></control><display><type>article</type><title>Elephant Flow Detection With Random Forest Models Under Programmable Network Dataplane Constraints</title><source>IEEE Open Access Journals</source><creator>Jurkiewicz, Piotr ; Kadziolka, Bartosz ; Kantor, Miroslaw ; Wojcik, Robert ; Domzal, Jerzy</creator><creatorcontrib>Jurkiewicz, Piotr ; Kadziolka, Bartosz ; Kantor, Miroslaw ; Wojcik, Robert ; Domzal, Jerzy</creatorcontrib><description>This paper investigates the application of tree-based machine learning classifiers for flow-based traffic engineering, focusing on the binary classification of IP network flows into mice (short flows) and elephants (long flows) using 5-tuple header fields from the first packet. Unlike prior studies on network flow classification, our analysis uses performance metrics normalized by traffic coverage, ensuring relevance for traffic engineering and QoS applications. We also evaluate models within the constraints of programmable switching hardware, such as the Intel Tofino P4 chip. Our findings show that such constrained models can achieve high accuracy while performing inference at line rate in the dataplane. Additionally, we reveal a trade-off between tree depth and input format, with bit transformations enabling more efficient feature extraction at lower depths. Our results show that optimal tree depths range from 15 to 25 levels, depending on the input format. The most effective model employs extremely randomized trees with bit-transformed input and trees of depth 20.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2024.3485588</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Accuracy ; Classification ; Computer networks ; Constraints ; Data models ; Decision trees ; elephant ; Elephants ; Flow production systems ; Flows ; Format ; IP networks ; Machine learning ; Measurement ; Performance measurement ; QoS ; Quality of service architectures ; Random forests ; Routing ; SDN ; Switches ; Telecommunication traffic ; Traffic control ; Traffic engineering ; Traffic flow ; Training</subject><ispartof>IEEE access, 2024, Vol.12, p.158561-158578</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c244t-18bd134d12dcbff486cb72f35479a0bc7f00f30b5bafa1cd4b7c59672771ea4e3</cites><orcidid>0000-0002-9562-606X ; 0000-0002-0774-610X ; 0000-0002-8379-2810 ; 0000-0003-3093-9089</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10731663$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,4024,27633,27923,27924,27925,54933</link.rule.ids></links><search><creatorcontrib>Jurkiewicz, Piotr</creatorcontrib><creatorcontrib>Kadziolka, Bartosz</creatorcontrib><creatorcontrib>Kantor, Miroslaw</creatorcontrib><creatorcontrib>Wojcik, Robert</creatorcontrib><creatorcontrib>Domzal, Jerzy</creatorcontrib><title>Elephant Flow Detection With Random Forest Models Under Programmable Network Dataplane Constraints</title><title>IEEE access</title><addtitle>Access</addtitle><description>This paper investigates the application of tree-based machine learning classifiers for flow-based traffic engineering, focusing on the binary classification of IP network flows into mice (short flows) and elephants (long flows) using 5-tuple header fields from the first packet. Unlike prior studies on network flow classification, our analysis uses performance metrics normalized by traffic coverage, ensuring relevance for traffic engineering and QoS applications. We also evaluate models within the constraints of programmable switching hardware, such as the Intel Tofino P4 chip. Our findings show that such constrained models can achieve high accuracy while performing inference at line rate in the dataplane. Additionally, we reveal a trade-off between tree depth and input format, with bit transformations enabling more efficient feature extraction at lower depths. Our results show that optimal tree depths range from 15 to 25 levels, depending on the input format. The most effective model employs extremely randomized trees with bit-transformed input and trees of depth 20.</description><subject>Accuracy</subject><subject>Classification</subject><subject>Computer networks</subject><subject>Constraints</subject><subject>Data models</subject><subject>Decision trees</subject><subject>elephant</subject><subject>Elephants</subject><subject>Flow production systems</subject><subject>Flows</subject><subject>Format</subject><subject>IP networks</subject><subject>Machine learning</subject><subject>Measurement</subject><subject>Performance measurement</subject><subject>QoS</subject><subject>Quality of service architectures</subject><subject>Random forests</subject><subject>Routing</subject><subject>SDN</subject><subject>Switches</subject><subject>Telecommunication traffic</subject><subject>Traffic control</subject><subject>Traffic engineering</subject><subject>Traffic flow</subject><subject>Training</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>DOA</sourceid><recordid>eNpNUdtKxDAQLaKgqF-gDwGfd82tSfdR6q4K3vCCj2GSTLVrt1mTiPj3ViviMDDDYc6ZGU5RHDA6ZYzOjk_qen5_P-WUy6mQVVlW1Uaxw5maTUQp1Oa_frvYT2lJh6gGqNQ7hZ13uH6BPpNFFz7IKWZ0uQ09eWrzC7mD3ocVWYSIKZOr4LFL5LH3GMltDM8RViuwHZJrzB8hvpJTyLDuoEdShz7lCG2f016x1UCXcP-37haPi_lDfT65vDm7qE8uJ45LmSessp4J6Rn3zjaNrJSzmjeilHoG1DrdUNoIaksLDTDnpdWunCnNtWYIEsVucTHq-gBLs47tCuKnCdCaHyDEZwMxt65DY5mS6JRXXljJnLBaiUpWwARSR7UbtI5GrXUMb-_D82YZ3mM_nG8E40NyyekwJcYpF0NKEZu_rYyab2_M6I359sb8ejOwDkdWi4j_GFowpYT4ArA9i10</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Jurkiewicz, Piotr</creator><creator>Kadziolka, Bartosz</creator><creator>Kantor, Miroslaw</creator><creator>Wojcik, Robert</creator><creator>Domzal, Jerzy</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-9562-606X</orcidid><orcidid>https://orcid.org/0000-0002-0774-610X</orcidid><orcidid>https://orcid.org/0000-0002-8379-2810</orcidid><orcidid>https://orcid.org/0000-0003-3093-9089</orcidid></search><sort><creationdate>2024</creationdate><title>Elephant Flow Detection With Random Forest Models Under Programmable Network Dataplane Constraints</title><author>Jurkiewicz, Piotr ; Kadziolka, Bartosz ; Kantor, Miroslaw ; Wojcik, Robert ; Domzal, Jerzy</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c244t-18bd134d12dcbff486cb72f35479a0bc7f00f30b5bafa1cd4b7c59672771ea4e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Classification</topic><topic>Computer networks</topic><topic>Constraints</topic><topic>Data models</topic><topic>Decision trees</topic><topic>elephant</topic><topic>Elephants</topic><topic>Flow production systems</topic><topic>Flows</topic><topic>Format</topic><topic>IP networks</topic><topic>Machine learning</topic><topic>Measurement</topic><topic>Performance measurement</topic><topic>QoS</topic><topic>Quality of service architectures</topic><topic>Random forests</topic><topic>Routing</topic><topic>SDN</topic><topic>Switches</topic><topic>Telecommunication traffic</topic><topic>Traffic control</topic><topic>Traffic engineering</topic><topic>Traffic flow</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jurkiewicz, Piotr</creatorcontrib><creatorcontrib>Kadziolka, Bartosz</creatorcontrib><creatorcontrib>Kantor, Miroslaw</creatorcontrib><creatorcontrib>Wojcik, Robert</creatorcontrib><creatorcontrib>Domzal, Jerzy</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998–Present</collection><collection>IEEE Xplore (Online service)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Jurkiewicz, Piotr</au><au>Kadziolka, Bartosz</au><au>Kantor, Miroslaw</au><au>Wojcik, Robert</au><au>Domzal, Jerzy</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Elephant Flow Detection With Random Forest Models Under Programmable Network Dataplane Constraints</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2024</date><risdate>2024</risdate><volume>12</volume><spage>158561</spage><epage>158578</epage><pages>158561-158578</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>This paper investigates the application of tree-based machine learning classifiers for flow-based traffic engineering, focusing on the binary classification of IP network flows into mice (short flows) and elephants (long flows) using 5-tuple header fields from the first packet. Unlike prior studies on network flow classification, our analysis uses performance metrics normalized by traffic coverage, ensuring relevance for traffic engineering and QoS applications. We also evaluate models within the constraints of programmable switching hardware, such as the Intel Tofino P4 chip. Our findings show that such constrained models can achieve high accuracy while performing inference at line rate in the dataplane. Additionally, we reveal a trade-off between tree depth and input format, with bit transformations enabling more efficient feature extraction at lower depths. Our results show that optimal tree depths range from 15 to 25 levels, depending on the input format. The most effective model employs extremely randomized trees with bit-transformed input and trees of depth 20.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2024.3485588</doi><tpages>18</tpages><orcidid>https://orcid.org/0000-0002-9562-606X</orcidid><orcidid>https://orcid.org/0000-0002-0774-610X</orcidid><orcidid>https://orcid.org/0000-0002-8379-2810</orcidid><orcidid>https://orcid.org/0000-0003-3093-9089</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2024, Vol.12, p.158561-158578
issn 2169-3536
2169-3536
language eng
recordid cdi_crossref_primary_10_1109_ACCESS_2024_3485588
source IEEE Open Access Journals
subjects Accuracy
Classification
Computer networks
Constraints
Data models
Decision trees
elephant
Elephants
Flow production systems
Flows
Format
IP networks
Machine learning
Measurement
Performance measurement
QoS
Quality of service architectures
Random forests
Routing
SDN
Switches
Telecommunication traffic
Traffic control
Traffic engineering
Traffic flow
Training
title Elephant Flow Detection With Random Forest Models Under Programmable Network Dataplane Constraints
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T13%3A02%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Elephant%20Flow%20Detection%20With%20Random%20Forest%20Models%20Under%20Programmable%20Network%20Dataplane%20Constraints&rft.jtitle=IEEE%20access&rft.au=Jurkiewicz,%20Piotr&rft.date=2024&rft.volume=12&rft.spage=158561&rft.epage=158578&rft.pages=158561-158578&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2024.3485588&rft_dat=%3Cproquest_cross%3E3123122420%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c244t-18bd134d12dcbff486cb72f35479a0bc7f00f30b5bafa1cd4b7c59672771ea4e3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3123122420&rft_id=info:pmid/&rft_ieee_id=10731663&rfr_iscdi=true