Loading…

On the performance of SQL scalable systems on Kubernetes: a comparative study

The popularization of Hadoop as the the-facto standard platform for data analytics in the context of Big Data applications has led to the upsurge of SQL-on-Hadoop systems, which provide scalable query execution engines allowing the use of SQL queries on data stored in HDFS. In this context, Kubernet...

Full description

Saved in:
Bibliographic Details
Published in:Cluster computing 2023-06, Vol.26 (3), p.1935-1947
Main Authors: Cardas, Cristian, Aldana-Martín, José F., Burgueño-Romero, Antonio M., Nebro, Antonio J., Mateos, Jose M., Sánchez, Juan J.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The popularization of Hadoop as the the-facto standard platform for data analytics in the context of Big Data applications has led to the upsurge of SQL-on-Hadoop systems, which provide scalable query execution engines allowing the use of SQL queries on data stored in HDFS. In this context, Kubernetes appears as the leading choice to simplify the deployment and scaling of containerized applications; however, there is a lack of studies about the performance of SQL-on-Hadoop systems deployed on Kubernetes, and this is the gap we intend to fill in this paper. We present an experimental study involving four representative SQL scalable platforms: Apache Drill, Apache Hive, Apache Spark SQL and Trino. Concretely, we analyze the performance of these systems when they are deployed on a Hadoop cluster with Kubernetes by using the TPC-H benchmark. The results of our study can help practitioners and users about what they can expect in terms of performance if they plan to use the advantages of Kubernetes to deploy applications using the analyzed SQL scalable platforms.
ISSN:1386-7857
1573-7543
DOI:10.1007/s10586-022-03718-9