Loading…

Global reduction for geo-distributed MapReduce across cloud federation

Geo-distributed Bigdata processing is increasing day by day, resulting in the origins of data that are geographically distributed in different countries and hold datacenters (DCs) across the globe, and also the applications that use different sites to increase reliability, security, and processing p...

Full description

Saved in:
Bibliographic Details
Published in:Future generation computer systems 2025-01, Vol.162, p.107492, Article 107492
Main Authors: Gouasmi, Thouraya, Kacem, Ahmed Hadj
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c185t-705739c3e00c345809185bd38373601828daaeffff6c431872229bdadccfa2a93
container_end_page
container_issue
container_start_page 107492
container_title Future generation computer systems
container_volume 162
creator Gouasmi, Thouraya
Kacem, Ahmed Hadj
description Geo-distributed Bigdata processing is increasing day by day, resulting in the origins of data that are geographically distributed in different countries and hold datacenters (DCs) across the globe, and also the applications that use different sites to increase reliability, security, and processing performances. Most popular frameworks like Hadoop and Spark are re-designed to process geographically distributed data at their locations. However, these methods still suffer from a large amount of data transfer over the Internet, which prohibits a high processing time and cost for many applications, and in several cases, the output results of the computation are smaller than its inputs. In this paper, we keep the data locality principle for processing data at different locations but ignore the principle of transferring the entire intermediate results to a single global reducer. We propose Geo-MR, an intelligent geo-distributed MapReduce-based framework across federated cloud based on two heuristic algorithms: (i) chosen the best clusters as global reducers to reduce the communication and optimize the transfer on the bandwidth, GResearch. (ii) The second, Geo-MR, ensures the scheduling of only the relevant data to selected global reducers that process the final results. As a baseline, we propose an exact MapReduce scheduling model for benchmarking and to compare and discuss the Geo-MR heuristic algorithm results. The experimental results show that the proposed algorithm Geo-MR can improve resource (bandwidth and VMs of clusters) utilization of the cloud federation and consequently reduce cost and job response time. •An heuristic algorithm GResearch is proposed to choose the best clusters as global reducers.•An heuristic scheduling algorithm Geo-MR is proposed to ensure the scheduling of only the relevant data to selected global reducers that process the final results. Geo- MR reduces communication and optimizes the transfer on bandwidth.•An exact scheduling algorithm formulated and solved as a mixed integer program is proposed.•Performance evaluation proves efficient results reducing MapReduce job cost and job response time while ensuring optimal resource allocation.
doi_str_mv 10.1016/j.future.2024.107492
format article
fullrecord <record><control><sourceid>elsevier_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1016_j_future_2024_107492</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0167739X24004485</els_id><sourcerecordid>S0167739X24004485</sourcerecordid><originalsourceid>FETCH-LOGICAL-c185t-705739c3e00c345809185bd38373601828daaeffff6c431872229bdadccfa2a93</originalsourceid><addsrcrecordid>eNp9j81KxDAUhbNQcBx9Axd5gdb8tE26EWRwxoERQRTchTS5kZQ6GZJW8O1NrWvv5sK59xzOh9ANJSUltLntSzeNU4SSEVZlSVQtO0OrfBKF4O37BbpMqSeEUMHpCm13Q-j0gCPYyYw-HLELEX9AKKxPY_TdNILFT_r0Mj8A1iaGlLAZwmSxAwtRz64rdO70kOD6b6_R2_bhdfNYHJ53-839oTBU1mMhSJ07GA6EGF7VkrRZ7iyXXPCGUMmk1RpcnsZUnErBGGs7q60xTjPd8jWqltzfGhGcOkX_qeO3okTN_KpXC7-a-dXCn213iw1yty8PUSXj4WjA-ghmVDb4_wN-ANcKaK4</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Global reduction for geo-distributed MapReduce across cloud federation</title><source>ScienceDirect Freedom Collection 2022-2024</source><creator>Gouasmi, Thouraya ; Kacem, Ahmed Hadj</creator><creatorcontrib>Gouasmi, Thouraya ; Kacem, Ahmed Hadj</creatorcontrib><description>Geo-distributed Bigdata processing is increasing day by day, resulting in the origins of data that are geographically distributed in different countries and hold datacenters (DCs) across the globe, and also the applications that use different sites to increase reliability, security, and processing performances. Most popular frameworks like Hadoop and Spark are re-designed to process geographically distributed data at their locations. However, these methods still suffer from a large amount of data transfer over the Internet, which prohibits a high processing time and cost for many applications, and in several cases, the output results of the computation are smaller than its inputs. In this paper, we keep the data locality principle for processing data at different locations but ignore the principle of transferring the entire intermediate results to a single global reducer. We propose Geo-MR, an intelligent geo-distributed MapReduce-based framework across federated cloud based on two heuristic algorithms: (i) chosen the best clusters as global reducers to reduce the communication and optimize the transfer on the bandwidth, GResearch. (ii) The second, Geo-MR, ensures the scheduling of only the relevant data to selected global reducers that process the final results. As a baseline, we propose an exact MapReduce scheduling model for benchmarking and to compare and discuss the Geo-MR heuristic algorithm results. The experimental results show that the proposed algorithm Geo-MR can improve resource (bandwidth and VMs of clusters) utilization of the cloud federation and consequently reduce cost and job response time. •An heuristic algorithm GResearch is proposed to choose the best clusters as global reducers.•An heuristic scheduling algorithm Geo-MR is proposed to ensure the scheduling of only the relevant data to selected global reducers that process the final results. Geo- MR reduces communication and optimizes the transfer on bandwidth.•An exact scheduling algorithm formulated and solved as a mixed integer program is proposed.•Performance evaluation proves efficient results reducing MapReduce job cost and job response time while ensuring optimal resource allocation.</description><identifier>ISSN: 0167-739X</identifier><identifier>DOI: 10.1016/j.future.2024.107492</identifier><language>eng</language><publisher>Elsevier B.V</publisher><subject>BigData ; Cloud federation ; Cost optimization ; Geo-distributed scheduling ; MapReduce</subject><ispartof>Future generation computer systems, 2025-01, Vol.162, p.107492, Article 107492</ispartof><rights>2024 Elsevier B.V.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c185t-705739c3e00c345809185bd38373601828daaeffff6c431872229bdadccfa2a93</cites><orcidid>0000-0002-8214-4862 ; 0000-0002-8895-0152</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Gouasmi, Thouraya</creatorcontrib><creatorcontrib>Kacem, Ahmed Hadj</creatorcontrib><title>Global reduction for geo-distributed MapReduce across cloud federation</title><title>Future generation computer systems</title><description>Geo-distributed Bigdata processing is increasing day by day, resulting in the origins of data that are geographically distributed in different countries and hold datacenters (DCs) across the globe, and also the applications that use different sites to increase reliability, security, and processing performances. Most popular frameworks like Hadoop and Spark are re-designed to process geographically distributed data at their locations. However, these methods still suffer from a large amount of data transfer over the Internet, which prohibits a high processing time and cost for many applications, and in several cases, the output results of the computation are smaller than its inputs. In this paper, we keep the data locality principle for processing data at different locations but ignore the principle of transferring the entire intermediate results to a single global reducer. We propose Geo-MR, an intelligent geo-distributed MapReduce-based framework across federated cloud based on two heuristic algorithms: (i) chosen the best clusters as global reducers to reduce the communication and optimize the transfer on the bandwidth, GResearch. (ii) The second, Geo-MR, ensures the scheduling of only the relevant data to selected global reducers that process the final results. As a baseline, we propose an exact MapReduce scheduling model for benchmarking and to compare and discuss the Geo-MR heuristic algorithm results. The experimental results show that the proposed algorithm Geo-MR can improve resource (bandwidth and VMs of clusters) utilization of the cloud federation and consequently reduce cost and job response time. •An heuristic algorithm GResearch is proposed to choose the best clusters as global reducers.•An heuristic scheduling algorithm Geo-MR is proposed to ensure the scheduling of only the relevant data to selected global reducers that process the final results. Geo- MR reduces communication and optimizes the transfer on bandwidth.•An exact scheduling algorithm formulated and solved as a mixed integer program is proposed.•Performance evaluation proves efficient results reducing MapReduce job cost and job response time while ensuring optimal resource allocation.</description><subject>BigData</subject><subject>Cloud federation</subject><subject>Cost optimization</subject><subject>Geo-distributed scheduling</subject><subject>MapReduce</subject><issn>0167-739X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2025</creationdate><recordtype>article</recordtype><recordid>eNp9j81KxDAUhbNQcBx9Axd5gdb8tE26EWRwxoERQRTchTS5kZQ6GZJW8O1NrWvv5sK59xzOh9ANJSUltLntSzeNU4SSEVZlSVQtO0OrfBKF4O37BbpMqSeEUMHpCm13Q-j0gCPYyYw-HLELEX9AKKxPY_TdNILFT_r0Mj8A1iaGlLAZwmSxAwtRz64rdO70kOD6b6_R2_bhdfNYHJ53-839oTBU1mMhSJ07GA6EGF7VkrRZ7iyXXPCGUMmk1RpcnsZUnErBGGs7q60xTjPd8jWqltzfGhGcOkX_qeO3okTN_KpXC7-a-dXCn213iw1yty8PUSXj4WjA-ghmVDb4_wN-ANcKaK4</recordid><startdate>202501</startdate><enddate>202501</enddate><creator>Gouasmi, Thouraya</creator><creator>Kacem, Ahmed Hadj</creator><general>Elsevier B.V</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-8214-4862</orcidid><orcidid>https://orcid.org/0000-0002-8895-0152</orcidid></search><sort><creationdate>202501</creationdate><title>Global reduction for geo-distributed MapReduce across cloud federation</title><author>Gouasmi, Thouraya ; Kacem, Ahmed Hadj</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c185t-705739c3e00c345809185bd38373601828daaeffff6c431872229bdadccfa2a93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2025</creationdate><topic>BigData</topic><topic>Cloud federation</topic><topic>Cost optimization</topic><topic>Geo-distributed scheduling</topic><topic>MapReduce</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Gouasmi, Thouraya</creatorcontrib><creatorcontrib>Kacem, Ahmed Hadj</creatorcontrib><collection>CrossRef</collection><jtitle>Future generation computer systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Gouasmi, Thouraya</au><au>Kacem, Ahmed Hadj</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Global reduction for geo-distributed MapReduce across cloud federation</atitle><jtitle>Future generation computer systems</jtitle><date>2025-01</date><risdate>2025</risdate><volume>162</volume><spage>107492</spage><pages>107492-</pages><artnum>107492</artnum><issn>0167-739X</issn><abstract>Geo-distributed Bigdata processing is increasing day by day, resulting in the origins of data that are geographically distributed in different countries and hold datacenters (DCs) across the globe, and also the applications that use different sites to increase reliability, security, and processing performances. Most popular frameworks like Hadoop and Spark are re-designed to process geographically distributed data at their locations. However, these methods still suffer from a large amount of data transfer over the Internet, which prohibits a high processing time and cost for many applications, and in several cases, the output results of the computation are smaller than its inputs. In this paper, we keep the data locality principle for processing data at different locations but ignore the principle of transferring the entire intermediate results to a single global reducer. We propose Geo-MR, an intelligent geo-distributed MapReduce-based framework across federated cloud based on two heuristic algorithms: (i) chosen the best clusters as global reducers to reduce the communication and optimize the transfer on the bandwidth, GResearch. (ii) The second, Geo-MR, ensures the scheduling of only the relevant data to selected global reducers that process the final results. As a baseline, we propose an exact MapReduce scheduling model for benchmarking and to compare and discuss the Geo-MR heuristic algorithm results. The experimental results show that the proposed algorithm Geo-MR can improve resource (bandwidth and VMs of clusters) utilization of the cloud federation and consequently reduce cost and job response time. •An heuristic algorithm GResearch is proposed to choose the best clusters as global reducers.•An heuristic scheduling algorithm Geo-MR is proposed to ensure the scheduling of only the relevant data to selected global reducers that process the final results. Geo- MR reduces communication and optimizes the transfer on bandwidth.•An exact scheduling algorithm formulated and solved as a mixed integer program is proposed.•Performance evaluation proves efficient results reducing MapReduce job cost and job response time while ensuring optimal resource allocation.</abstract><pub>Elsevier B.V</pub><doi>10.1016/j.future.2024.107492</doi><orcidid>https://orcid.org/0000-0002-8214-4862</orcidid><orcidid>https://orcid.org/0000-0002-8895-0152</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0167-739X
ispartof Future generation computer systems, 2025-01, Vol.162, p.107492, Article 107492
issn 0167-739X
language eng
recordid cdi_crossref_primary_10_1016_j_future_2024_107492
source ScienceDirect Freedom Collection 2022-2024
subjects BigData
Cloud federation
Cost optimization
Geo-distributed scheduling
MapReduce
title Global reduction for geo-distributed MapReduce across cloud federation
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T08%3A09%3A29IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-elsevier_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Global%20reduction%20for%20geo-distributed%20MapReduce%20across%20cloud%20federation&rft.jtitle=Future%20generation%20computer%20systems&rft.au=Gouasmi,%20Thouraya&rft.date=2025-01&rft.volume=162&rft.spage=107492&rft.pages=107492-&rft.artnum=107492&rft.issn=0167-739X&rft_id=info:doi/10.1016/j.future.2024.107492&rft_dat=%3Celsevier_cross%3ES0167739X24004485%3C/elsevier_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c185t-705739c3e00c345809185bd38373601828daaeffff6c431872229bdadccfa2a93%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true