Loading…
Global reduction for geo-distributed MapReduce across cloud federation
Geo-distributed Bigdata processing is increasing day by day, resulting in the origins of data that are geographically distributed in different countries and hold datacenters (DCs) across the globe, and also the applications that use different sites to increase reliability, security, and processing p...
Saved in:
Published in: | Future generation computer systems 2025-01, Vol.162, p.107492, Article 107492 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Geo-distributed Bigdata processing is increasing day by day, resulting in the origins of data that are geographically distributed in different countries and hold datacenters (DCs) across the globe, and also the applications that use different sites to increase reliability, security, and processing performances. Most popular frameworks like Hadoop and Spark are re-designed to process geographically distributed data at their locations. However, these methods still suffer from a large amount of data transfer over the Internet, which prohibits a high processing time and cost for many applications, and in several cases, the output results of the computation are smaller than its inputs. In this paper, we keep the data locality principle for processing data at different locations but ignore the principle of transferring the entire intermediate results to a single global reducer. We propose Geo-MR, an intelligent geo-distributed MapReduce-based framework across federated cloud based on two heuristic algorithms: (i) chosen the best clusters as global reducers to reduce the communication and optimize the transfer on the bandwidth, GResearch. (ii) The second, Geo-MR, ensures the scheduling of only the relevant data to selected global reducers that process the final results. As a baseline, we propose an exact MapReduce scheduling model for benchmarking and to compare and discuss the Geo-MR heuristic algorithm results. The experimental results show that the proposed algorithm Geo-MR can improve resource (bandwidth and VMs of clusters) utilization of the cloud federation and consequently reduce cost and job response time.
•An heuristic algorithm GResearch is proposed to choose the best clusters as global reducers.•An heuristic scheduling algorithm Geo-MR is proposed to ensure the scheduling of only the relevant data to selected global reducers that process the final results. Geo- MR reduces communication and optimizes the transfer on bandwidth.•An exact scheduling algorithm formulated and solved as a mixed integer program is proposed.•Performance evaluation proves efficient results reducing MapReduce job cost and job response time while ensuring optimal resource allocation. |
---|---|
ISSN: | 0167-739X |
DOI: | 10.1016/j.future.2024.107492 |