Loading…

PROADAPT: Proactive framework for adaptive partitioning for big data warehouses

Parallel DBMSs have become more and more mature and getting several success stories in the industry. This situation has been reached by powerful data partitioning and data allocation techniques and algorithms. By analyzing these findings closely, we figure out that they are quickly stressed by the w...

Full description

Saved in:
Bibliographic Details
Published in:Data & knowledge engineering 2022-11, Vol.142, p.102102, Article 102102
Main Authors: Benkrid, Soumia, Bellatreche, Ladjel, Mestoui, Yacine, Ordonez, Carlos
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Parallel DBMSs have become more and more mature and getting several success stories in the industry. This situation has been reached by powerful data partitioning and data allocation techniques and algorithms. By analyzing these findings closely, we figure out that they are quickly stressed by the workload changes, which represents the usual case of business analytics applications. To deal with this challenge in the context of big data warehouses, several studies proposed to move to another processing paradigm outside the DBMS realm such as Spark by proposing adaptive partitioning solutions to tackle the workload changes. The majority of approaches are offline and those that are online cause significant random disk I/O costs. This is because the correlation that may exist between jobs and data blocks read from the disk is not captured to refine the adaptive partitioning algorithms. This represents one of the major causes of providing high performance of dynamic workloads. To solve such limitations, we propose in this paper a proactive framework for query-aware adaptive partitioning (called PROADAPT) that uses an AI-inspired methodology that can be connected to any query optimizer managing partitioned data. This methodology uses a genetic algorithm to solve our formalized problem that considers the interaction that may exist among workload queries. PROADAPT intensively rewrites queries by exploiting dimension hierarchies to skip irrelevant data and then improves I/O performance. Different technical modules of our framework are discussed. Finally, we conduct intensive experiments on Postgres-XL and a Spark SQL parallel cluster to show the effectiveness and efficiency of our approach.
ISSN:0169-023X
1872-6933
DOI:10.1016/j.datak.2022.102102