Loading…
Elephant Flow Detection With Random Forest Models Under Programmable Network Dataplane Constraints
This paper investigates the application of tree-based machine learning classifiers for flow-based traffic engineering, focusing on the binary classification of IP network flows into mice (short flows) and elephants (long flows) using 5-tuple header fields from the first packet. Unlike prior studies...
Saved in:
Published in: | IEEE access 2024, Vol.12, p.158561-158578 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | cdi_FETCH-LOGICAL-c244t-18bd134d12dcbff486cb72f35479a0bc7f00f30b5bafa1cd4b7c59672771ea4e3 |
container_end_page | 158578 |
container_issue | |
container_start_page | 158561 |
container_title | IEEE access |
container_volume | 12 |
creator | Jurkiewicz, Piotr Kadziolka, Bartosz Kantor, Miroslaw Wojcik, Robert Domzal, Jerzy |
description | This paper investigates the application of tree-based machine learning classifiers for flow-based traffic engineering, focusing on the binary classification of IP network flows into mice (short flows) and elephants (long flows) using 5-tuple header fields from the first packet. Unlike prior studies on network flow classification, our analysis uses performance metrics normalized by traffic coverage, ensuring relevance for traffic engineering and QoS applications. We also evaluate models within the constraints of programmable switching hardware, such as the Intel Tofino P4 chip. Our findings show that such constrained models can achieve high accuracy while performing inference at line rate in the dataplane. Additionally, we reveal a trade-off between tree depth and input format, with bit transformations enabling more efficient feature extraction at lower depths. Our results show that optimal tree depths range from 15 to 25 levels, depending on the input format. The most effective model employs extremely randomized trees with bit-transformed input and trees of depth 20. |
doi_str_mv | 10.1109/ACCESS.2024.3485588 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_ACCESS_2024_3485588</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10731663</ieee_id><doaj_id>oai_doaj_org_article_b164ec6d6d3b41c3b763848a13e0c07c</doaj_id><sourcerecordid>3123122420</sourcerecordid><originalsourceid>FETCH-LOGICAL-c244t-18bd134d12dcbff486cb72f35479a0bc7f00f30b5bafa1cd4b7c59672771ea4e3</originalsourceid><addsrcrecordid>eNpNUdtKxDAQLaKgqF-gDwGfd82tSfdR6q4K3vCCj2GSTLVrt1mTiPj3ViviMDDDYc6ZGU5RHDA6ZYzOjk_qen5_P-WUy6mQVVlW1Uaxw5maTUQp1Oa_frvYT2lJh6gGqNQ7hZ13uH6BPpNFFz7IKWZ0uQ09eWrzC7mD3ocVWYSIKZOr4LFL5LH3GMltDM8RViuwHZJrzB8hvpJTyLDuoEdShz7lCG2f016x1UCXcP-37haPi_lDfT65vDm7qE8uJ45LmSessp4J6Rn3zjaNrJSzmjeilHoG1DrdUNoIaksLDTDnpdWunCnNtWYIEsVucTHq-gBLs47tCuKnCdCaHyDEZwMxt65DY5mS6JRXXljJnLBaiUpWwARSR7UbtI5GrXUMb-_D82YZ3mM_nG8E40NyyekwJcYpF0NKEZu_rYyab2_M6I359sb8ejOwDkdWi4j_GFowpYT4ArA9i10</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3123122420</pqid></control><display><type>article</type><title>Elephant Flow Detection With Random Forest Models Under Programmable Network Dataplane Constraints</title><source>IEEE Open Access Journals</source><creator>Jurkiewicz, Piotr ; Kadziolka, Bartosz ; Kantor, Miroslaw ; Wojcik, Robert ; Domzal, Jerzy</creator><creatorcontrib>Jurkiewicz, Piotr ; Kadziolka, Bartosz ; Kantor, Miroslaw ; Wojcik, Robert ; Domzal, Jerzy</creatorcontrib><description>This paper investigates the application of tree-based machine learning classifiers for flow-based traffic engineering, focusing on the binary classification of IP network flows into mice (short flows) and elephants (long flows) using 5-tuple header fields from the first packet. Unlike prior studies on network flow classification, our analysis uses performance metrics normalized by traffic coverage, ensuring relevance for traffic engineering and QoS applications. We also evaluate models within the constraints of programmable switching hardware, such as the Intel Tofino P4 chip. Our findings show that such constrained models can achieve high accuracy while performing inference at line rate in the dataplane. Additionally, we reveal a trade-off between tree depth and input format, with bit transformations enabling more efficient feature extraction at lower depths. Our results show that optimal tree depths range from 15 to 25 levels, depending on the input format. The most effective model employs extremely randomized trees with bit-transformed input and trees of depth 20.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2024.3485588</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Accuracy ; Classification ; Computer networks ; Constraints ; Data models ; Decision trees ; elephant ; Elephants ; Flow production systems ; Flows ; Format ; IP networks ; Machine learning ; Measurement ; Performance measurement ; QoS ; Quality of service architectures ; Random forests ; Routing ; SDN ; Switches ; Telecommunication traffic ; Traffic control ; Traffic engineering ; Traffic flow ; Training</subject><ispartof>IEEE access, 2024, Vol.12, p.158561-158578</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c244t-18bd134d12dcbff486cb72f35479a0bc7f00f30b5bafa1cd4b7c59672771ea4e3</cites><orcidid>0000-0002-9562-606X ; 0000-0002-0774-610X ; 0000-0002-8379-2810 ; 0000-0003-3093-9089</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10731663$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,4024,27633,27923,27924,27925,54933</link.rule.ids></links><search><creatorcontrib>Jurkiewicz, Piotr</creatorcontrib><creatorcontrib>Kadziolka, Bartosz</creatorcontrib><creatorcontrib>Kantor, Miroslaw</creatorcontrib><creatorcontrib>Wojcik, Robert</creatorcontrib><creatorcontrib>Domzal, Jerzy</creatorcontrib><title>Elephant Flow Detection With Random Forest Models Under Programmable Network Dataplane Constraints</title><title>IEEE access</title><addtitle>Access</addtitle><description>This paper investigates the application of tree-based machine learning classifiers for flow-based traffic engineering, focusing on the binary classification of IP network flows into mice (short flows) and elephants (long flows) using 5-tuple header fields from the first packet. Unlike prior studies on network flow classification, our analysis uses performance metrics normalized by traffic coverage, ensuring relevance for traffic engineering and QoS applications. We also evaluate models within the constraints of programmable switching hardware, such as the Intel Tofino P4 chip. Our findings show that such constrained models can achieve high accuracy while performing inference at line rate in the dataplane. Additionally, we reveal a trade-off between tree depth and input format, with bit transformations enabling more efficient feature extraction at lower depths. Our results show that optimal tree depths range from 15 to 25 levels, depending on the input format. The most effective model employs extremely randomized trees with bit-transformed input and trees of depth 20.</description><subject>Accuracy</subject><subject>Classification</subject><subject>Computer networks</subject><subject>Constraints</subject><subject>Data models</subject><subject>Decision trees</subject><subject>elephant</subject><subject>Elephants</subject><subject>Flow production systems</subject><subject>Flows</subject><subject>Format</subject><subject>IP networks</subject><subject>Machine learning</subject><subject>Measurement</subject><subject>Performance measurement</subject><subject>QoS</subject><subject>Quality of service architectures</subject><subject>Random forests</subject><subject>Routing</subject><subject>SDN</subject><subject>Switches</subject><subject>Telecommunication traffic</subject><subject>Traffic control</subject><subject>Traffic engineering</subject><subject>Traffic flow</subject><subject>Training</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>DOA</sourceid><recordid>eNpNUdtKxDAQLaKgqF-gDwGfd82tSfdR6q4K3vCCj2GSTLVrt1mTiPj3ViviMDDDYc6ZGU5RHDA6ZYzOjk_qen5_P-WUy6mQVVlW1Uaxw5maTUQp1Oa_frvYT2lJh6gGqNQ7hZ13uH6BPpNFFz7IKWZ0uQ09eWrzC7mD3ocVWYSIKZOr4LFL5LH3GMltDM8RViuwHZJrzB8hvpJTyLDuoEdShz7lCG2f016x1UCXcP-37haPi_lDfT65vDm7qE8uJ45LmSessp4J6Rn3zjaNrJSzmjeilHoG1DrdUNoIaksLDTDnpdWunCnNtWYIEsVucTHq-gBLs47tCuKnCdCaHyDEZwMxt65DY5mS6JRXXljJnLBaiUpWwARSR7UbtI5GrXUMb-_D82YZ3mM_nG8E40NyyekwJcYpF0NKEZu_rYyab2_M6I359sb8ejOwDkdWi4j_GFowpYT4ArA9i10</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Jurkiewicz, Piotr</creator><creator>Kadziolka, Bartosz</creator><creator>Kantor, Miroslaw</creator><creator>Wojcik, Robert</creator><creator>Domzal, Jerzy</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-9562-606X</orcidid><orcidid>https://orcid.org/0000-0002-0774-610X</orcidid><orcidid>https://orcid.org/0000-0002-8379-2810</orcidid><orcidid>https://orcid.org/0000-0003-3093-9089</orcidid></search><sort><creationdate>2024</creationdate><title>Elephant Flow Detection With Random Forest Models Under Programmable Network Dataplane Constraints</title><author>Jurkiewicz, Piotr ; Kadziolka, Bartosz ; Kantor, Miroslaw ; Wojcik, Robert ; Domzal, Jerzy</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c244t-18bd134d12dcbff486cb72f35479a0bc7f00f30b5bafa1cd4b7c59672771ea4e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Classification</topic><topic>Computer networks</topic><topic>Constraints</topic><topic>Data models</topic><topic>Decision trees</topic><topic>elephant</topic><topic>Elephants</topic><topic>Flow production systems</topic><topic>Flows</topic><topic>Format</topic><topic>IP networks</topic><topic>Machine learning</topic><topic>Measurement</topic><topic>Performance measurement</topic><topic>QoS</topic><topic>Quality of service architectures</topic><topic>Random forests</topic><topic>Routing</topic><topic>SDN</topic><topic>Switches</topic><topic>Telecommunication traffic</topic><topic>Traffic control</topic><topic>Traffic engineering</topic><topic>Traffic flow</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jurkiewicz, Piotr</creatorcontrib><creatorcontrib>Kadziolka, Bartosz</creatorcontrib><creatorcontrib>Kantor, Miroslaw</creatorcontrib><creatorcontrib>Wojcik, Robert</creatorcontrib><creatorcontrib>Domzal, Jerzy</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998–Present</collection><collection>IEEE Xplore (Online service)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Jurkiewicz, Piotr</au><au>Kadziolka, Bartosz</au><au>Kantor, Miroslaw</au><au>Wojcik, Robert</au><au>Domzal, Jerzy</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Elephant Flow Detection With Random Forest Models Under Programmable Network Dataplane Constraints</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2024</date><risdate>2024</risdate><volume>12</volume><spage>158561</spage><epage>158578</epage><pages>158561-158578</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>This paper investigates the application of tree-based machine learning classifiers for flow-based traffic engineering, focusing on the binary classification of IP network flows into mice (short flows) and elephants (long flows) using 5-tuple header fields from the first packet. Unlike prior studies on network flow classification, our analysis uses performance metrics normalized by traffic coverage, ensuring relevance for traffic engineering and QoS applications. We also evaluate models within the constraints of programmable switching hardware, such as the Intel Tofino P4 chip. Our findings show that such constrained models can achieve high accuracy while performing inference at line rate in the dataplane. Additionally, we reveal a trade-off between tree depth and input format, with bit transformations enabling more efficient feature extraction at lower depths. Our results show that optimal tree depths range from 15 to 25 levels, depending on the input format. The most effective model employs extremely randomized trees with bit-transformed input and trees of depth 20.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2024.3485588</doi><tpages>18</tpages><orcidid>https://orcid.org/0000-0002-9562-606X</orcidid><orcidid>https://orcid.org/0000-0002-0774-610X</orcidid><orcidid>https://orcid.org/0000-0002-8379-2810</orcidid><orcidid>https://orcid.org/0000-0003-3093-9089</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2169-3536 |
ispartof | IEEE access, 2024, Vol.12, p.158561-158578 |
issn | 2169-3536 2169-3536 |
language | eng |
recordid | cdi_crossref_primary_10_1109_ACCESS_2024_3485588 |
source | IEEE Open Access Journals |
subjects | Accuracy Classification Computer networks Constraints Data models Decision trees elephant Elephants Flow production systems Flows Format IP networks Machine learning Measurement Performance measurement QoS Quality of service architectures Random forests Routing SDN Switches Telecommunication traffic Traffic control Traffic engineering Traffic flow Training |
title | Elephant Flow Detection With Random Forest Models Under Programmable Network Dataplane Constraints |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T13%3A02%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Elephant%20Flow%20Detection%20With%20Random%20Forest%20Models%20Under%20Programmable%20Network%20Dataplane%20Constraints&rft.jtitle=IEEE%20access&rft.au=Jurkiewicz,%20Piotr&rft.date=2024&rft.volume=12&rft.spage=158561&rft.epage=158578&rft.pages=158561-158578&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2024.3485588&rft_dat=%3Cproquest_cross%3E3123122420%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c244t-18bd134d12dcbff486cb72f35479a0bc7f00f30b5bafa1cd4b7c59672771ea4e3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3123122420&rft_id=info:pmid/&rft_ieee_id=10731663&rfr_iscdi=true |