Loading…

Automatic Parameter Tuning for Big Data Pipelines with Deep Reinforcement Learning

Tuning big data frameworks is a very important task to get the best performance for a given application. However, these frameworks are rarely used individually, they generally constitute a pipeline, each having a different role. This makes tuning big data pipelines an important yet difficult task gi...

Full description

Saved in:
Bibliographic Details
Main Authors: Sagaama, Houssem, Slimane, Nourchene Ben, Marwani, Maher, Skhiri, Sabri
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 7
container_issue
container_start_page 1
container_title
container_volume
creator Sagaama, Houssem
Slimane, Nourchene Ben
Marwani, Maher
Skhiri, Sabri
description Tuning big data frameworks is a very important task to get the best performance for a given application. However, these frameworks are rarely used individually, they generally constitute a pipeline, each having a different role. This makes tuning big data pipelines an important yet difficult task given the size of the search space. Moreover, we have to consider the interaction between these frameworks when tuning the configuration parameters of the big data pipeline. A trade-off is then required to achieve the best end-to-end performance. Machine learning based methods have shown great success in automatic tuning systems, but they rely on a large number of high quality learning examples that are rather difficult to obtain. In this context, we propose to use a deep reinforcement learning algorithm, namely Twin Delayed Deep Deterministic Policy Gradient, TD3, to tune a fraud detection big data pipeline. We show through the conducted experiments that the TD3 agent improves the overall performance of the pipeline by up to 63% with only 200 training steps, outperforming the random search on the high-dimensional search space.
doi_str_mv 10.1109/ISCC53001.2021.9631440
format conference_proceeding
fullrecord <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_9631440</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9631440</ieee_id><sourcerecordid>9631440</sourcerecordid><originalsourceid>FETCH-LOGICAL-i203t-daea1ffa2f5145e85f69f78210fcb89ca7c28db83d7f3424c0870c71fda5a1ac3</originalsourceid><addsrcrecordid>eNotkM1OAjEURquJiYg8gYnpCwze_k3bJQ6iJCQSxDW5dG6xhhnITInx7dXI6tuccxYfY_cCxkKAf5i_VZVRAGIsQYqxL5XQGi7YyFsnytJoabX2l2wgSy0Lq5y_Zjd9_wkAzkg7YKvJKR8azCnwJXbYUKaOr09tanc8Hjr-mHZ8ihn5Mh1pn1rq-VfKH3xKdOQrSu0vFKihNvMFYffn3bKriPueRucdsvfZ07p6KRavz_NqsiiSBJWLGglFjCijEdqQM7H00TopIIat8wFtkK7eOlXbqLTUAZyFYEWs0aDAoIbs7r-biGhz7FKD3ffmfIH6AUw6UaE</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Automatic Parameter Tuning for Big Data Pipelines with Deep Reinforcement Learning</title><source>IEEE Xplore All Conference Series</source><creator>Sagaama, Houssem ; Slimane, Nourchene Ben ; Marwani, Maher ; Skhiri, Sabri</creator><creatorcontrib>Sagaama, Houssem ; Slimane, Nourchene Ben ; Marwani, Maher ; Skhiri, Sabri</creatorcontrib><description>Tuning big data frameworks is a very important task to get the best performance for a given application. However, these frameworks are rarely used individually, they generally constitute a pipeline, each having a different role. This makes tuning big data pipelines an important yet difficult task given the size of the search space. Moreover, we have to consider the interaction between these frameworks when tuning the configuration parameters of the big data pipeline. A trade-off is then required to achieve the best end-to-end performance. Machine learning based methods have shown great success in automatic tuning systems, but they rely on a large number of high quality learning examples that are rather difficult to obtain. In this context, we propose to use a deep reinforcement learning algorithm, namely Twin Delayed Deep Deterministic Policy Gradient, TD3, to tune a fraud detection big data pipeline. We show through the conducted experiments that the TD3 agent improves the overall performance of the pipeline by up to 63% with only 200 training steps, outperforming the random search on the high-dimensional search space.</description><identifier>EISSN: 2642-7389</identifier><identifier>EISBN: 9781665427449</identifier><identifier>EISBN: 1665427442</identifier><identifier>DOI: 10.1109/ISCC53001.2021.9631440</identifier><language>eng</language><publisher>IEEE</publisher><subject>Actor-Critic ; Auto-tuning system ; Big Data ; Big Data Pipelines ; Computers ; Deep Reinforcement Learning ; Machine learning algorithms ; Performance Optimization ; Pipelines ; Reinforcement learning ; Task analysis ; Training</subject><ispartof>2021 IEEE Symposium on Computers and Communications (ISCC), 2021, p.1-7</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed><orcidid>0000-0001-8760-3557 ; 0000-0003-0434-310X ; 0000-0002-0664-5788 ; 0000-0001-7792-7857</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9631440$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9631440$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Sagaama, Houssem</creatorcontrib><creatorcontrib>Slimane, Nourchene Ben</creatorcontrib><creatorcontrib>Marwani, Maher</creatorcontrib><creatorcontrib>Skhiri, Sabri</creatorcontrib><title>Automatic Parameter Tuning for Big Data Pipelines with Deep Reinforcement Learning</title><title>2021 IEEE Symposium on Computers and Communications (ISCC)</title><addtitle>ISCC</addtitle><description>Tuning big data frameworks is a very important task to get the best performance for a given application. However, these frameworks are rarely used individually, they generally constitute a pipeline, each having a different role. This makes tuning big data pipelines an important yet difficult task given the size of the search space. Moreover, we have to consider the interaction between these frameworks when tuning the configuration parameters of the big data pipeline. A trade-off is then required to achieve the best end-to-end performance. Machine learning based methods have shown great success in automatic tuning systems, but they rely on a large number of high quality learning examples that are rather difficult to obtain. In this context, we propose to use a deep reinforcement learning algorithm, namely Twin Delayed Deep Deterministic Policy Gradient, TD3, to tune a fraud detection big data pipeline. We show through the conducted experiments that the TD3 agent improves the overall performance of the pipeline by up to 63% with only 200 training steps, outperforming the random search on the high-dimensional search space.</description><subject>Actor-Critic</subject><subject>Auto-tuning system</subject><subject>Big Data</subject><subject>Big Data Pipelines</subject><subject>Computers</subject><subject>Deep Reinforcement Learning</subject><subject>Machine learning algorithms</subject><subject>Performance Optimization</subject><subject>Pipelines</subject><subject>Reinforcement learning</subject><subject>Task analysis</subject><subject>Training</subject><issn>2642-7389</issn><isbn>9781665427449</isbn><isbn>1665427442</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2021</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotkM1OAjEURquJiYg8gYnpCwze_k3bJQ6iJCQSxDW5dG6xhhnITInx7dXI6tuccxYfY_cCxkKAf5i_VZVRAGIsQYqxL5XQGi7YyFsnytJoabX2l2wgSy0Lq5y_Zjd9_wkAzkg7YKvJKR8azCnwJXbYUKaOr09tanc8Hjr-mHZ8ihn5Mh1pn1rq-VfKH3xKdOQrSu0vFKihNvMFYffn3bKriPueRucdsvfZ07p6KRavz_NqsiiSBJWLGglFjCijEdqQM7H00TopIIat8wFtkK7eOlXbqLTUAZyFYEWs0aDAoIbs7r-biGhz7FKD3ffmfIH6AUw6UaE</recordid><startdate>20210905</startdate><enddate>20210905</enddate><creator>Sagaama, Houssem</creator><creator>Slimane, Nourchene Ben</creator><creator>Marwani, Maher</creator><creator>Skhiri, Sabri</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope><orcidid>https://orcid.org/0000-0001-8760-3557</orcidid><orcidid>https://orcid.org/0000-0003-0434-310X</orcidid><orcidid>https://orcid.org/0000-0002-0664-5788</orcidid><orcidid>https://orcid.org/0000-0001-7792-7857</orcidid></search><sort><creationdate>20210905</creationdate><title>Automatic Parameter Tuning for Big Data Pipelines with Deep Reinforcement Learning</title><author>Sagaama, Houssem ; Slimane, Nourchene Ben ; Marwani, Maher ; Skhiri, Sabri</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i203t-daea1ffa2f5145e85f69f78210fcb89ca7c28db83d7f3424c0870c71fda5a1ac3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Actor-Critic</topic><topic>Auto-tuning system</topic><topic>Big Data</topic><topic>Big Data Pipelines</topic><topic>Computers</topic><topic>Deep Reinforcement Learning</topic><topic>Machine learning algorithms</topic><topic>Performance Optimization</topic><topic>Pipelines</topic><topic>Reinforcement learning</topic><topic>Task analysis</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Sagaama, Houssem</creatorcontrib><creatorcontrib>Slimane, Nourchene Ben</creatorcontrib><creatorcontrib>Marwani, Maher</creatorcontrib><creatorcontrib>Skhiri, Sabri</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Xplore</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Sagaama, Houssem</au><au>Slimane, Nourchene Ben</au><au>Marwani, Maher</au><au>Skhiri, Sabri</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Automatic Parameter Tuning for Big Data Pipelines with Deep Reinforcement Learning</atitle><btitle>2021 IEEE Symposium on Computers and Communications (ISCC)</btitle><stitle>ISCC</stitle><date>2021-09-05</date><risdate>2021</risdate><spage>1</spage><epage>7</epage><pages>1-7</pages><eissn>2642-7389</eissn><eisbn>9781665427449</eisbn><eisbn>1665427442</eisbn><abstract>Tuning big data frameworks is a very important task to get the best performance for a given application. However, these frameworks are rarely used individually, they generally constitute a pipeline, each having a different role. This makes tuning big data pipelines an important yet difficult task given the size of the search space. Moreover, we have to consider the interaction between these frameworks when tuning the configuration parameters of the big data pipeline. A trade-off is then required to achieve the best end-to-end performance. Machine learning based methods have shown great success in automatic tuning systems, but they rely on a large number of high quality learning examples that are rather difficult to obtain. In this context, we propose to use a deep reinforcement learning algorithm, namely Twin Delayed Deep Deterministic Policy Gradient, TD3, to tune a fraud detection big data pipeline. We show through the conducted experiments that the TD3 agent improves the overall performance of the pipeline by up to 63% with only 200 training steps, outperforming the random search on the high-dimensional search space.</abstract><pub>IEEE</pub><doi>10.1109/ISCC53001.2021.9631440</doi><tpages>7</tpages><orcidid>https://orcid.org/0000-0001-8760-3557</orcidid><orcidid>https://orcid.org/0000-0003-0434-310X</orcidid><orcidid>https://orcid.org/0000-0002-0664-5788</orcidid><orcidid>https://orcid.org/0000-0001-7792-7857</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier EISSN: 2642-7389
ispartof 2021 IEEE Symposium on Computers and Communications (ISCC), 2021, p.1-7
issn 2642-7389
language eng
recordid cdi_ieee_primary_9631440
source IEEE Xplore All Conference Series
subjects Actor-Critic
Auto-tuning system
Big Data
Big Data Pipelines
Computers
Deep Reinforcement Learning
Machine learning algorithms
Performance Optimization
Pipelines
Reinforcement learning
Task analysis
Training
title Automatic Parameter Tuning for Big Data Pipelines with Deep Reinforcement Learning
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T23%3A52%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Automatic%20Parameter%20Tuning%20for%20Big%20Data%20Pipelines%20with%20Deep%20Reinforcement%20Learning&rft.btitle=2021%20IEEE%20Symposium%20on%20Computers%20and%20Communications%20(ISCC)&rft.au=Sagaama,%20Houssem&rft.date=2021-09-05&rft.spage=1&rft.epage=7&rft.pages=1-7&rft.eissn=2642-7389&rft_id=info:doi/10.1109/ISCC53001.2021.9631440&rft.eisbn=9781665427449&rft.eisbn_list=1665427442&rft_dat=%3Cieee_CHZPO%3E9631440%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i203t-daea1ffa2f5145e85f69f78210fcb89ca7c28db83d7f3424c0870c71fda5a1ac3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=9631440&rfr_iscdi=true