Loading…

PROADAPT: Proactive framework for adaptive partitioning for big data warehouses

Parallel DBMSs have become more and more mature and getting several success stories in the industry. This situation has been reached by powerful data partitioning and data allocation techniques and algorithms. By analyzing these findings closely, we figure out that they are quickly stressed by the w...

Full description

Saved in:

Bibliographic Details
Published in:	Data & knowledge engineering 2022-11, Vol.142, p.102102, Article 102102
Main Authors:	Benkrid, Soumia, Bellatreche, Ladjel, Mestoui, Yacine, Ordonez, Carlos
Format:	Article
Language:	English
Subjects:	Dimension’s hierarchies Multidimensional partitioning Self-adaptive partitioning Utility maximization Workload clustering
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c233t-9ba70368dd4c08229f625c58bd042ee31cff2f6ccc82fc06ab6cd8bda790ed9d3
cites	cdi_FETCH-LOGICAL-c233t-9ba70368dd4c08229f625c58bd042ee31cff2f6ccc82fc06ab6cd8bda790ed9d3
container_end_page
container_issue
container_start_page	102102
container_title	Data & knowledge engineering
container_volume	142
creator	Benkrid, Soumia Bellatreche, Ladjel Mestoui, Yacine Ordonez, Carlos
description	Parallel DBMSs have become more and more mature and getting several success stories in the industry. This situation has been reached by powerful data partitioning and data allocation techniques and algorithms. By analyzing these findings closely, we figure out that they are quickly stressed by the workload changes, which represents the usual case of business analytics applications. To deal with this challenge in the context of big data warehouses, several studies proposed to move to another processing paradigm outside the DBMS realm such as Spark by proposing adaptive partitioning solutions to tackle the workload changes. The majority of approaches are offline and those that are online cause significant random disk I/O costs. This is because the correlation that may exist between jobs and data blocks read from the disk is not captured to refine the adaptive partitioning algorithms. This represents one of the major causes of providing high performance of dynamic workloads. To solve such limitations, we propose in this paper a proactive framework for query-aware adaptive partitioning (called PROADAPT) that uses an AI-inspired methodology that can be connected to any query optimizer managing partitioned data. This methodology uses a genetic algorithm to solve our formalized problem that considers the interaction that may exist among workload queries. PROADAPT intensively rewrites queries by exploiting dimension hierarchies to skip irrelevant data and then improves I/O performance. Different technical modules of our framework are discussed. Finally, we conduct intensive experiments on Postgres-XL and a Spark SQL parallel cluster to show the effectiveness and efficiency of our approach.
doi_str_mv	10.1016/j.datak.2022.102102
format	article
fullrecord	<record><control><sourceid>elsevier_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1016_j_datak_2022_102102</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0169023X22000933</els_id><sourcerecordid>S0169023X22000933</sourcerecordid><originalsourceid>FETCH-LOGICAL-c233t-9ba70368dd4c08229f625c58bd042ee31cff2f6ccc82fc06ab6cd8bda790ed9d3</originalsourceid><addsrcrecordid>eNp9kN1KAzEQhYMoWKtP4M2-wNbZSZvuCl6U-gtCi1TwLmQnk5rWNiVZW3x7t63XwsDAOXNmhk-I6wJ6BRTqZtGzpjHLHgJiq2BbJ6JTlEPMVSXlqei0U1UOKD_OxUVKCwDAPgw6YjJ9m4zuR9PZbTaNwVDjt5y5aFa8C3GZuRAzY83mIG9MbHzjw9qv5wen9vNsfzjbmcif4TtxuhRnznwlvvrrXfH--DAbP-evk6eX8eg1J5SyyavaDEGq0to-QYlYOYUDGpS1hT4yy4KcQ6eIqERHoEytyLauGVbAtrKyK-RxL8WQUmSnN9GvTPzRBeg9E73QByZ6z0QfmbSpu2OK29e2nqNO5HlNbH1karQN_t_8Lw66bIY</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>PROADAPT: Proactive framework for adaptive partitioning for big data warehouses</title><source>ScienceDirect Freedom Collection 2022-2024</source><creator>Benkrid, Soumia ; Bellatreche, Ladjel ; Mestoui, Yacine ; Ordonez, Carlos</creator><creatorcontrib>Benkrid, Soumia ; Bellatreche, Ladjel ; Mestoui, Yacine ; Ordonez, Carlos</creatorcontrib><description>Parallel DBMSs have become more and more mature and getting several success stories in the industry. This situation has been reached by powerful data partitioning and data allocation techniques and algorithms. By analyzing these findings closely, we figure out that they are quickly stressed by the workload changes, which represents the usual case of business analytics applications. To deal with this challenge in the context of big data warehouses, several studies proposed to move to another processing paradigm outside the DBMS realm such as Spark by proposing adaptive partitioning solutions to tackle the workload changes. The majority of approaches are offline and those that are online cause significant random disk I/O costs. This is because the correlation that may exist between jobs and data blocks read from the disk is not captured to refine the adaptive partitioning algorithms. This represents one of the major causes of providing high performance of dynamic workloads. To solve such limitations, we propose in this paper a proactive framework for query-aware adaptive partitioning (called PROADAPT) that uses an AI-inspired methodology that can be connected to any query optimizer managing partitioned data. This methodology uses a genetic algorithm to solve our formalized problem that considers the interaction that may exist among workload queries. PROADAPT intensively rewrites queries by exploiting dimension hierarchies to skip irrelevant data and then improves I/O performance. Different technical modules of our framework are discussed. Finally, we conduct intensive experiments on Postgres-XL and a Spark SQL parallel cluster to show the effectiveness and efficiency of our approach.</description><identifier>ISSN: 0169-023X</identifier><identifier>EISSN: 1872-6933</identifier><identifier>DOI: 10.1016/j.datak.2022.102102</identifier><language>eng</language><publisher>Elsevier B.V</publisher><subject>Dimension’s hierarchies ; Multidimensional partitioning ; Self-adaptive partitioning ; Utility maximization ; Workload clustering</subject><ispartof>Data & knowledge engineering, 2022-11, Vol.142, p.102102, Article 102102</ispartof><rights>2022 Elsevier B.V.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c233t-9ba70368dd4c08229f625c58bd042ee31cff2f6ccc82fc06ab6cd8bda790ed9d3</citedby><cites>FETCH-LOGICAL-c233t-9ba70368dd4c08229f625c58bd042ee31cff2f6ccc82fc06ab6cd8bda790ed9d3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Benkrid, Soumia</creatorcontrib><creatorcontrib>Bellatreche, Ladjel</creatorcontrib><creatorcontrib>Mestoui, Yacine</creatorcontrib><creatorcontrib>Ordonez, Carlos</creatorcontrib><title>PROADAPT: Proactive framework for adaptive partitioning for big data warehouses</title><title>Data & knowledge engineering</title><description>Parallel DBMSs have become more and more mature and getting several success stories in the industry. This situation has been reached by powerful data partitioning and data allocation techniques and algorithms. By analyzing these findings closely, we figure out that they are quickly stressed by the workload changes, which represents the usual case of business analytics applications. To deal with this challenge in the context of big data warehouses, several studies proposed to move to another processing paradigm outside the DBMS realm such as Spark by proposing adaptive partitioning solutions to tackle the workload changes. The majority of approaches are offline and those that are online cause significant random disk I/O costs. This is because the correlation that may exist between jobs and data blocks read from the disk is not captured to refine the adaptive partitioning algorithms. This represents one of the major causes of providing high performance of dynamic workloads. To solve such limitations, we propose in this paper a proactive framework for query-aware adaptive partitioning (called PROADAPT) that uses an AI-inspired methodology that can be connected to any query optimizer managing partitioned data. This methodology uses a genetic algorithm to solve our formalized problem that considers the interaction that may exist among workload queries. PROADAPT intensively rewrites queries by exploiting dimension hierarchies to skip irrelevant data and then improves I/O performance. Different technical modules of our framework are discussed. Finally, we conduct intensive experiments on Postgres-XL and a Spark SQL parallel cluster to show the effectiveness and efficiency of our approach.</description><subject>Dimension’s hierarchies</subject><subject>Multidimensional partitioning</subject><subject>Self-adaptive partitioning</subject><subject>Utility maximization</subject><subject>Workload clustering</subject><issn>0169-023X</issn><issn>1872-6933</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9kN1KAzEQhYMoWKtP4M2-wNbZSZvuCl6U-gtCi1TwLmQnk5rWNiVZW3x7t63XwsDAOXNmhk-I6wJ6BRTqZtGzpjHLHgJiq2BbJ6JTlEPMVSXlqei0U1UOKD_OxUVKCwDAPgw6YjJ9m4zuR9PZbTaNwVDjt5y5aFa8C3GZuRAzY83mIG9MbHzjw9qv5wen9vNsfzjbmcif4TtxuhRnznwlvvrrXfH--DAbP-evk6eX8eg1J5SyyavaDEGq0to-QYlYOYUDGpS1hT4yy4KcQ6eIqERHoEytyLauGVbAtrKyK-RxL8WQUmSnN9GvTPzRBeg9E73QByZ6z0QfmbSpu2OK29e2nqNO5HlNbH1karQN_t_8Lw66bIY</recordid><startdate>202211</startdate><enddate>202211</enddate><creator>Benkrid, Soumia</creator><creator>Bellatreche, Ladjel</creator><creator>Mestoui, Yacine</creator><creator>Ordonez, Carlos</creator><general>Elsevier B.V</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>202211</creationdate><title>PROADAPT: Proactive framework for adaptive partitioning for big data warehouses</title><author>Benkrid, Soumia ; Bellatreche, Ladjel ; Mestoui, Yacine ; Ordonez, Carlos</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c233t-9ba70368dd4c08229f625c58bd042ee31cff2f6ccc82fc06ab6cd8bda790ed9d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Dimension’s hierarchies</topic><topic>Multidimensional partitioning</topic><topic>Self-adaptive partitioning</topic><topic>Utility maximization</topic><topic>Workload clustering</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Benkrid, Soumia</creatorcontrib><creatorcontrib>Bellatreche, Ladjel</creatorcontrib><creatorcontrib>Mestoui, Yacine</creatorcontrib><creatorcontrib>Ordonez, Carlos</creatorcontrib><collection>CrossRef</collection><jtitle>Data & knowledge engineering</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Benkrid, Soumia</au><au>Bellatreche, Ladjel</au><au>Mestoui, Yacine</au><au>Ordonez, Carlos</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>PROADAPT: Proactive framework for adaptive partitioning for big data warehouses</atitle><jtitle>Data & knowledge engineering</jtitle><date>2022-11</date><risdate>2022</risdate><volume>142</volume><spage>102102</spage><pages>102102-</pages><artnum>102102</artnum><issn>0169-023X</issn><eissn>1872-6933</eissn><abstract>Parallel DBMSs have become more and more mature and getting several success stories in the industry. This situation has been reached by powerful data partitioning and data allocation techniques and algorithms. By analyzing these findings closely, we figure out that they are quickly stressed by the workload changes, which represents the usual case of business analytics applications. To deal with this challenge in the context of big data warehouses, several studies proposed to move to another processing paradigm outside the DBMS realm such as Spark by proposing adaptive partitioning solutions to tackle the workload changes. The majority of approaches are offline and those that are online cause significant random disk I/O costs. This is because the correlation that may exist between jobs and data blocks read from the disk is not captured to refine the adaptive partitioning algorithms. This represents one of the major causes of providing high performance of dynamic workloads. To solve such limitations, we propose in this paper a proactive framework for query-aware adaptive partitioning (called PROADAPT) that uses an AI-inspired methodology that can be connected to any query optimizer managing partitioned data. This methodology uses a genetic algorithm to solve our formalized problem that considers the interaction that may exist among workload queries. PROADAPT intensively rewrites queries by exploiting dimension hierarchies to skip irrelevant data and then improves I/O performance. Different technical modules of our framework are discussed. Finally, we conduct intensive experiments on Postgres-XL and a Spark SQL parallel cluster to show the effectiveness and efficiency of our approach.</abstract><pub>Elsevier B.V</pub><doi>10.1016/j.datak.2022.102102</doi></addata></record>
fulltext	fulltext
identifier	ISSN: 0169-023X
ispartof	Data & knowledge engineering, 2022-11, Vol.142, p.102102, Article 102102
issn	0169-023X 1872-6933
language	eng
recordid	cdi_crossref_primary_10_1016_j_datak_2022_102102
source	ScienceDirect Freedom Collection 2022-2024
subjects	Dimension’s hierarchies Multidimensional partitioning Self-adaptive partitioning Utility maximization Workload clustering
title	PROADAPT: Proactive framework for adaptive partitioning for big data warehouses
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T02%3A32%3A51IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-elsevier_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=PROADAPT:%20Proactive%20framework%20for%20adaptive%20partitioning%20for%20big%20data%20warehouses&rft.jtitle=Data%20&%20knowledge%20engineering&rft.au=Benkrid,%20Soumia&rft.date=2022-11&rft.volume=142&rft.spage=102102&rft.pages=102102-&rft.artnum=102102&rft.issn=0169-023X&rft.eissn=1872-6933&rft_id=info:doi/10.1016/j.datak.2022.102102&rft_dat=%3Celsevier_cross%3ES0169023X22000933%3C/elsevier_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c233t-9ba70368dd4c08229f625c58bd042ee31cff2f6ccc82fc06ab6cd8bda790ed9d3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true