Loading…

MRA++: Scheduling and data placement on MapReduce for heterogeneous environments

MapReduce has emerged as a popular programming model in the field of data-intensive computing. This is due to its simplistic design, which provides ease of use for programmers, and its framework implementations such as Hadoop, which have been adopted by large business and technology companies. In th...

Full description

Saved in:
Bibliographic Details
Published in:Future generation computer systems 2015-01, Vol.42, p.22-35
Main Authors: Anjos, Julio C.S., Carrera, Iván, Kolberg, Wagner, Tibola, Andre Luis, Arantes, Luciana B., Geyer, Claudio R.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:MapReduce has emerged as a popular programming model in the field of data-intensive computing. This is due to its simplistic design, which provides ease of use for programmers, and its framework implementations such as Hadoop, which have been adopted by large business and technology companies. In this paper we make some improvements to the Hadoop MapReduce framework by introducing algorithms that are suitable for heterogeneous environments. The goal is to efficiently perform data-intensive computing in heterogeneous environments. The need for these adaptations derives from the fact that, following the framework design proposed by Google, Hadoop is optimized to run in large homogeneous clusters. Hence we propose MRA++, a new MapReduce framework design that considers the heterogeneity of nodes during data distribution, task scheduling and job control. MRA++establishes a training task to gather information prior to the data distribution. However, we show that the delay introduced in the setup phase is offset by the effectiveness of the mechanisms and algorithms, that achieve performance gains of more than 70% in 10 Mbps networks. •MRA++—MapReduce with Adapted Algorithms for Heterogeneous Environments.•To address the main problems caused by the simplification of the MapReduce model.•The algorithms will allow the use of data-intensive applications in Internet.•This proposal suggests a potential value for use on desktop grids.•A low delay in the setup phase of jobs justifies the use of this algorithms.
ISSN:0167-739X
1872-7115
DOI:10.1016/j.future.2014.09.001