Loading…

MR-Advisor: A comprehensive tuning, profiling, and prediction tool for MapReduce execution frameworks on HPC clusters

MapReduce is the most popular parallel computing framework for big data processing which allows massive scalability across distributed computing environment. Advanced RDMA-based design of Hadoop MapReduce has been proposed that alleviates the performance bottlenecks in default Hadoop MapReduce by le...

Full description

Saved in:
Bibliographic Details
Published in:Journal of parallel and distributed computing 2018-10, Vol.120, p.237-250
Main Authors: Rahman, Md. Wasi-ur, Islam, Nusrat Sharmin, Lu, Xiaoyi, Shankar, Dipti, Panda, Dhabaleswar K. (DK)
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:MapReduce is the most popular parallel computing framework for big data processing which allows massive scalability across distributed computing environment. Advanced RDMA-based design of Hadoop MapReduce has been proposed that alleviates the performance bottlenecks in default Hadoop MapReduce by leveraging the benefits from RDMA. On the other hand, data processing engine, Spark, provides fast execution of MapReduce applications through in-memory processing. Performance optimization for these contemporary big data processing frameworks on modern High-Performance Computing (HPC) systems is a formidable task because of the numerous configuration possibilities in each of them. In this paper, we propose MR-Advisor, a comprehensive tuning, profiling, and prediction tool for MapReduce. MR-Advisor is generalized to provide performance optimizations for Hadoop, Spark, and RDMA-enhanced Hadoop MapReduce designs over different file systems such as HDFS, Lustre, and Tachyon. Performance evaluations reveal that, with MR-Advisor’s suggested values, the job execution performance can be enhanced by a maximum of 58% over the current best-practice values for user-level configuration parameters. To the best of our knowledge, this is the first tool that supports tuning and prediction for both Apache Hadoop and Spark, as well as the RDMA and Lustre-based advanced designs. •Recommendation with the best possible configuration for MapReduce applications.•Generalized configuration parameter space to enable any big data middleware.•Profiling and prediction capabilities with black-box and profiling-based approaches.
ISSN:0743-7315
1096-0848
DOI:10.1016/j.jpdc.2017.11.004