Loading…
PROADAPT: Proactive framework for adaptive partitioning for big data warehouses
Parallel DBMSs have become more and more mature and getting several success stories in the industry. This situation has been reached by powerful data partitioning and data allocation techniques and algorithms. By analyzing these findings closely, we figure out that they are quickly stressed by the w...
Saved in:
Published in: | Data & knowledge engineering 2022-11, Vol.142, p.102102, Article 102102 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Parallel DBMSs have become more and more mature and getting several success stories in the industry. This situation has been reached by powerful data partitioning and data allocation techniques and algorithms. By analyzing these findings closely, we figure out that they are quickly stressed by the workload changes, which represents the usual case of business analytics applications. To deal with this challenge in the context of big data warehouses, several studies proposed to move to another processing paradigm outside the DBMS realm such as Spark by proposing adaptive partitioning solutions to tackle the workload changes. The majority of approaches are offline and those that are online cause significant random disk I/O costs. This is because the correlation that may exist between jobs and data blocks read from the disk is not captured to refine the adaptive partitioning algorithms. This represents one of the major causes of providing high performance of dynamic workloads. To solve such limitations, we propose in this paper a proactive framework for query-aware adaptive partitioning (called PROADAPT) that uses an AI-inspired methodology that can be connected to any query optimizer managing partitioned data. This methodology uses a genetic algorithm to solve our formalized problem that considers the interaction that may exist among workload queries. PROADAPT intensively rewrites queries by exploiting dimension hierarchies to skip irrelevant data and then improves I/O performance. Different technical modules of our framework are discussed. Finally, we conduct intensive experiments on Postgres-XL and a Spark SQL parallel cluster to show the effectiveness and efficiency of our approach. |
---|---|
ISSN: | 0169-023X 1872-6933 |
DOI: | 10.1016/j.datak.2022.102102 |