Loading…

PROADAPT: Proactive framework for adaptive partitioning for big data warehouses

Parallel DBMSs have become more and more mature and getting several success stories in the industry. This situation has been reached by powerful data partitioning and data allocation techniques and algorithms. By analyzing these findings closely, we figure out that they are quickly stressed by the w...

Full description

Saved in:
Bibliographic Details
Published in:Data & knowledge engineering 2022-11, Vol.142, p.102102, Article 102102
Main Authors: Benkrid, Soumia, Bellatreche, Ladjel, Mestoui, Yacine, Ordonez, Carlos
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c233t-9ba70368dd4c08229f625c58bd042ee31cff2f6ccc82fc06ab6cd8bda790ed9d3
cites cdi_FETCH-LOGICAL-c233t-9ba70368dd4c08229f625c58bd042ee31cff2f6ccc82fc06ab6cd8bda790ed9d3
container_end_page
container_issue
container_start_page 102102
container_title Data & knowledge engineering
container_volume 142
creator Benkrid, Soumia
Bellatreche, Ladjel
Mestoui, Yacine
Ordonez, Carlos
description Parallel DBMSs have become more and more mature and getting several success stories in the industry. This situation has been reached by powerful data partitioning and data allocation techniques and algorithms. By analyzing these findings closely, we figure out that they are quickly stressed by the workload changes, which represents the usual case of business analytics applications. To deal with this challenge in the context of big data warehouses, several studies proposed to move to another processing paradigm outside the DBMS realm such as Spark by proposing adaptive partitioning solutions to tackle the workload changes. The majority of approaches are offline and those that are online cause significant random disk I/O costs. This is because the correlation that may exist between jobs and data blocks read from the disk is not captured to refine the adaptive partitioning algorithms. This represents one of the major causes of providing high performance of dynamic workloads. To solve such limitations, we propose in this paper a proactive framework for query-aware adaptive partitioning (called PROADAPT) that uses an AI-inspired methodology that can be connected to any query optimizer managing partitioned data. This methodology uses a genetic algorithm to solve our formalized problem that considers the interaction that may exist among workload queries. PROADAPT intensively rewrites queries by exploiting dimension hierarchies to skip irrelevant data and then improves I/O performance. Different technical modules of our framework are discussed. Finally, we conduct intensive experiments on Postgres-XL and a Spark SQL parallel cluster to show the effectiveness and efficiency of our approach.
doi_str_mv 10.1016/j.datak.2022.102102
format article
fullrecord <record><control><sourceid>elsevier_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1016_j_datak_2022_102102</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0169023X22000933</els_id><sourcerecordid>S0169023X22000933</sourcerecordid><originalsourceid>FETCH-LOGICAL-c233t-9ba70368dd4c08229f625c58bd042ee31cff2f6ccc82fc06ab6cd8bda790ed9d3</originalsourceid><addsrcrecordid>eNp9kN1KAzEQhYMoWKtP4M2-wNbZSZvuCl6U-gtCi1TwLmQnk5rWNiVZW3x7t63XwsDAOXNmhk-I6wJ6BRTqZtGzpjHLHgJiq2BbJ6JTlEPMVSXlqei0U1UOKD_OxUVKCwDAPgw6YjJ9m4zuR9PZbTaNwVDjt5y5aFa8C3GZuRAzY83mIG9MbHzjw9qv5wen9vNsfzjbmcif4TtxuhRnznwlvvrrXfH--DAbP-evk6eX8eg1J5SyyavaDEGq0to-QYlYOYUDGpS1hT4yy4KcQ6eIqERHoEytyLauGVbAtrKyK-RxL8WQUmSnN9GvTPzRBeg9E73QByZ6z0QfmbSpu2OK29e2nqNO5HlNbH1karQN_t_8Lw66bIY</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>PROADAPT: Proactive framework for adaptive partitioning for big data warehouses</title><source>ScienceDirect Freedom Collection 2022-2024</source><creator>Benkrid, Soumia ; Bellatreche, Ladjel ; Mestoui, Yacine ; Ordonez, Carlos</creator><creatorcontrib>Benkrid, Soumia ; Bellatreche, Ladjel ; Mestoui, Yacine ; Ordonez, Carlos</creatorcontrib><description>Parallel DBMSs have become more and more mature and getting several success stories in the industry. This situation has been reached by powerful data partitioning and data allocation techniques and algorithms. By analyzing these findings closely, we figure out that they are quickly stressed by the workload changes, which represents the usual case of business analytics applications. To deal with this challenge in the context of big data warehouses, several studies proposed to move to another processing paradigm outside the DBMS realm such as Spark by proposing adaptive partitioning solutions to tackle the workload changes. The majority of approaches are offline and those that are online cause significant random disk I/O costs. This is because the correlation that may exist between jobs and data blocks read from the disk is not captured to refine the adaptive partitioning algorithms. This represents one of the major causes of providing high performance of dynamic workloads. To solve such limitations, we propose in this paper a proactive framework for query-aware adaptive partitioning (called PROADAPT) that uses an AI-inspired methodology that can be connected to any query optimizer managing partitioned data. This methodology uses a genetic algorithm to solve our formalized problem that considers the interaction that may exist among workload queries. PROADAPT intensively rewrites queries by exploiting dimension hierarchies to skip irrelevant data and then improves I/O performance. Different technical modules of our framework are discussed. Finally, we conduct intensive experiments on Postgres-XL and a Spark SQL parallel cluster to show the effectiveness and efficiency of our approach.</description><identifier>ISSN: 0169-023X</identifier><identifier>EISSN: 1872-6933</identifier><identifier>DOI: 10.1016/j.datak.2022.102102</identifier><language>eng</language><publisher>Elsevier B.V</publisher><subject>Dimension’s hierarchies ; Multidimensional partitioning ; Self-adaptive partitioning ; Utility maximization ; Workload clustering</subject><ispartof>Data &amp; knowledge engineering, 2022-11, Vol.142, p.102102, Article 102102</ispartof><rights>2022 Elsevier B.V.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c233t-9ba70368dd4c08229f625c58bd042ee31cff2f6ccc82fc06ab6cd8bda790ed9d3</citedby><cites>FETCH-LOGICAL-c233t-9ba70368dd4c08229f625c58bd042ee31cff2f6ccc82fc06ab6cd8bda790ed9d3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Benkrid, Soumia</creatorcontrib><creatorcontrib>Bellatreche, Ladjel</creatorcontrib><creatorcontrib>Mestoui, Yacine</creatorcontrib><creatorcontrib>Ordonez, Carlos</creatorcontrib><title>PROADAPT: Proactive framework for adaptive partitioning for big data warehouses</title><title>Data &amp; knowledge engineering</title><description>Parallel DBMSs have become more and more mature and getting several success stories in the industry. This situation has been reached by powerful data partitioning and data allocation techniques and algorithms. By analyzing these findings closely, we figure out that they are quickly stressed by the workload changes, which represents the usual case of business analytics applications. To deal with this challenge in the context of big data warehouses, several studies proposed to move to another processing paradigm outside the DBMS realm such as Spark by proposing adaptive partitioning solutions to tackle the workload changes. The majority of approaches are offline and those that are online cause significant random disk I/O costs. This is because the correlation that may exist between jobs and data blocks read from the disk is not captured to refine the adaptive partitioning algorithms. This represents one of the major causes of providing high performance of dynamic workloads. To solve such limitations, we propose in this paper a proactive framework for query-aware adaptive partitioning (called PROADAPT) that uses an AI-inspired methodology that can be connected to any query optimizer managing partitioned data. This methodology uses a genetic algorithm to solve our formalized problem that considers the interaction that may exist among workload queries. PROADAPT intensively rewrites queries by exploiting dimension hierarchies to skip irrelevant data and then improves I/O performance. Different technical modules of our framework are discussed. Finally, we conduct intensive experiments on Postgres-XL and a Spark SQL parallel cluster to show the effectiveness and efficiency of our approach.</description><subject>Dimension’s hierarchies</subject><subject>Multidimensional partitioning</subject><subject>Self-adaptive partitioning</subject><subject>Utility maximization</subject><subject>Workload clustering</subject><issn>0169-023X</issn><issn>1872-6933</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9kN1KAzEQhYMoWKtP4M2-wNbZSZvuCl6U-gtCi1TwLmQnk5rWNiVZW3x7t63XwsDAOXNmhk-I6wJ6BRTqZtGzpjHLHgJiq2BbJ6JTlEPMVSXlqei0U1UOKD_OxUVKCwDAPgw6YjJ9m4zuR9PZbTaNwVDjt5y5aFa8C3GZuRAzY83mIG9MbHzjw9qv5wen9vNsfzjbmcif4TtxuhRnznwlvvrrXfH--DAbP-evk6eX8eg1J5SyyavaDEGq0to-QYlYOYUDGpS1hT4yy4KcQ6eIqERHoEytyLauGVbAtrKyK-RxL8WQUmSnN9GvTPzRBeg9E73QByZ6z0QfmbSpu2OK29e2nqNO5HlNbH1karQN_t_8Lw66bIY</recordid><startdate>202211</startdate><enddate>202211</enddate><creator>Benkrid, Soumia</creator><creator>Bellatreche, Ladjel</creator><creator>Mestoui, Yacine</creator><creator>Ordonez, Carlos</creator><general>Elsevier B.V</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>202211</creationdate><title>PROADAPT: Proactive framework for adaptive partitioning for big data warehouses</title><author>Benkrid, Soumia ; Bellatreche, Ladjel ; Mestoui, Yacine ; Ordonez, Carlos</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c233t-9ba70368dd4c08229f625c58bd042ee31cff2f6ccc82fc06ab6cd8bda790ed9d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Dimension’s hierarchies</topic><topic>Multidimensional partitioning</topic><topic>Self-adaptive partitioning</topic><topic>Utility maximization</topic><topic>Workload clustering</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Benkrid, Soumia</creatorcontrib><creatorcontrib>Bellatreche, Ladjel</creatorcontrib><creatorcontrib>Mestoui, Yacine</creatorcontrib><creatorcontrib>Ordonez, Carlos</creatorcontrib><collection>CrossRef</collection><jtitle>Data &amp; knowledge engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Benkrid, Soumia</au><au>Bellatreche, Ladjel</au><au>Mestoui, Yacine</au><au>Ordonez, Carlos</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>PROADAPT: Proactive framework for adaptive partitioning for big data warehouses</atitle><jtitle>Data &amp; knowledge engineering</jtitle><date>2022-11</date><risdate>2022</risdate><volume>142</volume><spage>102102</spage><pages>102102-</pages><artnum>102102</artnum><issn>0169-023X</issn><eissn>1872-6933</eissn><abstract>Parallel DBMSs have become more and more mature and getting several success stories in the industry. This situation has been reached by powerful data partitioning and data allocation techniques and algorithms. By analyzing these findings closely, we figure out that they are quickly stressed by the workload changes, which represents the usual case of business analytics applications. To deal with this challenge in the context of big data warehouses, several studies proposed to move to another processing paradigm outside the DBMS realm such as Spark by proposing adaptive partitioning solutions to tackle the workload changes. The majority of approaches are offline and those that are online cause significant random disk I/O costs. This is because the correlation that may exist between jobs and data blocks read from the disk is not captured to refine the adaptive partitioning algorithms. This represents one of the major causes of providing high performance of dynamic workloads. To solve such limitations, we propose in this paper a proactive framework for query-aware adaptive partitioning (called PROADAPT) that uses an AI-inspired methodology that can be connected to any query optimizer managing partitioned data. This methodology uses a genetic algorithm to solve our formalized problem that considers the interaction that may exist among workload queries. PROADAPT intensively rewrites queries by exploiting dimension hierarchies to skip irrelevant data and then improves I/O performance. Different technical modules of our framework are discussed. Finally, we conduct intensive experiments on Postgres-XL and a Spark SQL parallel cluster to show the effectiveness and efficiency of our approach.</abstract><pub>Elsevier B.V</pub><doi>10.1016/j.datak.2022.102102</doi></addata></record>
fulltext fulltext
identifier ISSN: 0169-023X
ispartof Data & knowledge engineering, 2022-11, Vol.142, p.102102, Article 102102
issn 0169-023X
1872-6933
language eng
recordid cdi_crossref_primary_10_1016_j_datak_2022_102102
source ScienceDirect Freedom Collection 2022-2024
subjects Dimension’s hierarchies
Multidimensional partitioning
Self-adaptive partitioning
Utility maximization
Workload clustering
title PROADAPT: Proactive framework for adaptive partitioning for big data warehouses
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T02%3A32%3A51IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-elsevier_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=PROADAPT:%20Proactive%20framework%20for%20adaptive%20partitioning%20for%20big%20data%20warehouses&rft.jtitle=Data%20&%20knowledge%20engineering&rft.au=Benkrid,%20Soumia&rft.date=2022-11&rft.volume=142&rft.spage=102102&rft.pages=102102-&rft.artnum=102102&rft.issn=0169-023X&rft.eissn=1872-6933&rft_id=info:doi/10.1016/j.datak.2022.102102&rft_dat=%3Celsevier_cross%3ES0169023X22000933%3C/elsevier_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c233t-9ba70368dd4c08229f625c58bd042ee31cff2f6ccc82fc06ab6cd8bda790ed9d3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true