Loading…

SDTP: Accelerating Wide-Area Data Analytics With Simultaneous Data Transfer and Processing

For the efficient analysis of geo-distributed datasets, cloud providers implement data-parallel jobs across geo-distributed sites (e.g., datacenters and edge clusters), which are generally interconnected by wide-area network links. However, current state-of-the-art geo-distributed data analytic meth...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on cloud computing 2023-01, Vol.11 (1), p.911-926
Main Authors: Chen, Yiting, Luo, Lailong, Guo, Deke, Rottenstreich, Ori, Wu, Jie
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c333t-c53784b18d16edcdac05d9e3209b97bf31e66dd34e5cf66a70ae0faa38081aeb3
cites cdi_FETCH-LOGICAL-c333t-c53784b18d16edcdac05d9e3209b97bf31e66dd34e5cf66a70ae0faa38081aeb3
container_end_page 926
container_issue 1
container_start_page 911
container_title IEEE transactions on cloud computing
container_volume 11
creator Chen, Yiting
Luo, Lailong
Guo, Deke
Rottenstreich, Ori
Wu, Jie
description For the efficient analysis of geo-distributed datasets, cloud providers implement data-parallel jobs across geo-distributed sites (e.g., datacenters and edge clusters), which are generally interconnected by wide-area network links. However, current state-of-the-art geo-distributed data analytic methods fail to make full use of the available network and computing resources. The main reason is that such geo-distributed methods must wait for bottleneck sites to complete the corresponding transmission and computation in each phase. Furthermore, such geo-distributed methods may be impractical to the network bandwidth dynamicity and diverse job parallelism. To this end, we propose a Simultaneous Data Transfer and Processing (SDTP) mechanism to accelerate wide-area data analytics, with the joint consideration of network bandwidth dynamics and job parallelism. In the SDTP, a site can execute the computation, provided that it obtains the required input data. As a result, the input data loading, map, shuffle, and reduce phases at each site need not wait for the completion of the previous phases of other sites. We further improve the SDTP method by offering more accurate time estimation and generalizing the mechanism to dynamic situations. The trace-driven results demonstrate that SDTP can improve the wide-area analytic job response time by 19% to 72% compared to other methods.
doi_str_mv 10.1109/TCC.2021.3119991
format article
fullrecord <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_proquest_journals_2784554609</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9574634</ieee_id><sourcerecordid>2784554609</sourcerecordid><originalsourceid>FETCH-LOGICAL-c333t-c53784b18d16edcdac05d9e3209b97bf31e66dd34e5cf66a70ae0faa38081aeb3</originalsourceid><addsrcrecordid>eNo9kE1rwzAMhs3YYKXrfbBLYOd0Vpw48W4h3RcUVmjGYBfj2MqWkiadnRz67-eSMl0k0KMX8RByC3QJQMVDWRTLiEawZABCCLggs4ilUUgpZJd-Bp6FKXC4JgvndtRXloAAMSNf21W5eQxyrbFFq4am-w4-G4NhblEFKzWoIO9Uexwa7fxi-Am2zX5sB9VhP7oJKK3qXI02UJ0JNrbX6JzPuSFXtWodLs59Tj6en8riNVy_v7wV-TrUjLEh1AlLs7iCzABHo43SNDECWURFJdKqZoCcG8NiTHTNuUqpQlorxTKagcKKzcn9lHuw_e-IbpC7frT-aScjn5wkMafCU3SitO2ds1jLg232yh4lUHmSKL1EeZIozxL9yd100iDiPy6SNOYsZn9JNmzv</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2784554609</pqid></control><display><type>article</type><title>SDTP: Accelerating Wide-Area Data Analytics With Simultaneous Data Transfer and Processing</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Chen, Yiting ; Luo, Lailong ; Guo, Deke ; Rottenstreich, Ori ; Wu, Jie</creator><creatorcontrib>Chen, Yiting ; Luo, Lailong ; Guo, Deke ; Rottenstreich, Ori ; Wu, Jie</creatorcontrib><description>For the efficient analysis of geo-distributed datasets, cloud providers implement data-parallel jobs across geo-distributed sites (e.g., datacenters and edge clusters), which are generally interconnected by wide-area network links. However, current state-of-the-art geo-distributed data analytic methods fail to make full use of the available network and computing resources. The main reason is that such geo-distributed methods must wait for bottleneck sites to complete the corresponding transmission and computation in each phase. Furthermore, such geo-distributed methods may be impractical to the network bandwidth dynamicity and diverse job parallelism. To this end, we propose a Simultaneous Data Transfer and Processing (SDTP) mechanism to accelerate wide-area data analytics, with the joint consideration of network bandwidth dynamics and job parallelism. In the SDTP, a site can execute the computation, provided that it obtains the required input data. As a result, the input data loading, map, shuffle, and reduce phases at each site need not wait for the completion of the previous phases of other sites. We further improve the SDTP method by offering more accurate time estimation and generalizing the mechanism to dynamic situations. The trace-driven results demonstrate that SDTP can improve the wide-area analytic job response time by 19% to 72% compared to other methods.</description><identifier>ISSN: 2168-7161</identifier><identifier>EISSN: 2372-0018</identifier><identifier>DOI: 10.1109/TCC.2021.3119991</identifier><identifier>CODEN: ITCCF6</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Bandwidth ; Bandwidths ; Computation ; Data analysis ; Data centers ; Data transfer (computers) ; dynamic network ; job parallelism ; job response time ; Mathematical analysis ; Parallel processing ; Silicon ; Task analysis ; task scheduling ; Time factors ; Wide area networks ; Wide-area data analytics</subject><ispartof>IEEE transactions on cloud computing, 2023-01, Vol.11 (1), p.911-926</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c333t-c53784b18d16edcdac05d9e3209b97bf31e66dd34e5cf66a70ae0faa38081aeb3</citedby><cites>FETCH-LOGICAL-c333t-c53784b18d16edcdac05d9e3209b97bf31e66dd34e5cf66a70ae0faa38081aeb3</cites><orcidid>0000-0002-3472-1717 ; 0000-0002-7999-4532 ; 0000-0003-4894-5540 ; 0000-0002-4886-9974 ; 0000-0002-4064-1238</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9574634$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,27901,27902,54771</link.rule.ids></links><search><creatorcontrib>Chen, Yiting</creatorcontrib><creatorcontrib>Luo, Lailong</creatorcontrib><creatorcontrib>Guo, Deke</creatorcontrib><creatorcontrib>Rottenstreich, Ori</creatorcontrib><creatorcontrib>Wu, Jie</creatorcontrib><title>SDTP: Accelerating Wide-Area Data Analytics With Simultaneous Data Transfer and Processing</title><title>IEEE transactions on cloud computing</title><addtitle>TCC</addtitle><description>For the efficient analysis of geo-distributed datasets, cloud providers implement data-parallel jobs across geo-distributed sites (e.g., datacenters and edge clusters), which are generally interconnected by wide-area network links. However, current state-of-the-art geo-distributed data analytic methods fail to make full use of the available network and computing resources. The main reason is that such geo-distributed methods must wait for bottleneck sites to complete the corresponding transmission and computation in each phase. Furthermore, such geo-distributed methods may be impractical to the network bandwidth dynamicity and diverse job parallelism. To this end, we propose a Simultaneous Data Transfer and Processing (SDTP) mechanism to accelerate wide-area data analytics, with the joint consideration of network bandwidth dynamics and job parallelism. In the SDTP, a site can execute the computation, provided that it obtains the required input data. As a result, the input data loading, map, shuffle, and reduce phases at each site need not wait for the completion of the previous phases of other sites. We further improve the SDTP method by offering more accurate time estimation and generalizing the mechanism to dynamic situations. The trace-driven results demonstrate that SDTP can improve the wide-area analytic job response time by 19% to 72% compared to other methods.</description><subject>Bandwidth</subject><subject>Bandwidths</subject><subject>Computation</subject><subject>Data analysis</subject><subject>Data centers</subject><subject>Data transfer (computers)</subject><subject>dynamic network</subject><subject>job parallelism</subject><subject>job response time</subject><subject>Mathematical analysis</subject><subject>Parallel processing</subject><subject>Silicon</subject><subject>Task analysis</subject><subject>task scheduling</subject><subject>Time factors</subject><subject>Wide area networks</subject><subject>Wide-area data analytics</subject><issn>2168-7161</issn><issn>2372-0018</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><recordid>eNo9kE1rwzAMhs3YYKXrfbBLYOd0Vpw48W4h3RcUVmjGYBfj2MqWkiadnRz67-eSMl0k0KMX8RByC3QJQMVDWRTLiEawZABCCLggs4ilUUgpZJd-Bp6FKXC4JgvndtRXloAAMSNf21W5eQxyrbFFq4am-w4-G4NhblEFKzWoIO9Uexwa7fxi-Am2zX5sB9VhP7oJKK3qXI02UJ0JNrbX6JzPuSFXtWodLs59Tj6en8riNVy_v7wV-TrUjLEh1AlLs7iCzABHo43SNDECWURFJdKqZoCcG8NiTHTNuUqpQlorxTKagcKKzcn9lHuw_e-IbpC7frT-aScjn5wkMafCU3SitO2ds1jLg232yh4lUHmSKL1EeZIozxL9yd100iDiPy6SNOYsZn9JNmzv</recordid><startdate>202301</startdate><enddate>202301</enddate><creator>Chen, Yiting</creator><creator>Luo, Lailong</creator><creator>Guo, Deke</creator><creator>Rottenstreich, Ori</creator><creator>Wu, Jie</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-3472-1717</orcidid><orcidid>https://orcid.org/0000-0002-7999-4532</orcidid><orcidid>https://orcid.org/0000-0003-4894-5540</orcidid><orcidid>https://orcid.org/0000-0002-4886-9974</orcidid><orcidid>https://orcid.org/0000-0002-4064-1238</orcidid></search><sort><creationdate>202301</creationdate><title>SDTP: Accelerating Wide-Area Data Analytics With Simultaneous Data Transfer and Processing</title><author>Chen, Yiting ; Luo, Lailong ; Guo, Deke ; Rottenstreich, Ori ; Wu, Jie</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c333t-c53784b18d16edcdac05d9e3209b97bf31e66dd34e5cf66a70ae0faa38081aeb3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Bandwidth</topic><topic>Bandwidths</topic><topic>Computation</topic><topic>Data analysis</topic><topic>Data centers</topic><topic>Data transfer (computers)</topic><topic>dynamic network</topic><topic>job parallelism</topic><topic>job response time</topic><topic>Mathematical analysis</topic><topic>Parallel processing</topic><topic>Silicon</topic><topic>Task analysis</topic><topic>task scheduling</topic><topic>Time factors</topic><topic>Wide area networks</topic><topic>Wide-area data analytics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chen, Yiting</creatorcontrib><creatorcontrib>Luo, Lailong</creatorcontrib><creatorcontrib>Guo, Deke</creatorcontrib><creatorcontrib>Rottenstreich, Ori</creatorcontrib><creatorcontrib>Wu, Jie</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005–Present</collection><collection>IEEE Xplore Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Xplore</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on cloud computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chen, Yiting</au><au>Luo, Lailong</au><au>Guo, Deke</au><au>Rottenstreich, Ori</au><au>Wu, Jie</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>SDTP: Accelerating Wide-Area Data Analytics With Simultaneous Data Transfer and Processing</atitle><jtitle>IEEE transactions on cloud computing</jtitle><stitle>TCC</stitle><date>2023-01</date><risdate>2023</risdate><volume>11</volume><issue>1</issue><spage>911</spage><epage>926</epage><pages>911-926</pages><issn>2168-7161</issn><eissn>2372-0018</eissn><coden>ITCCF6</coden><abstract>For the efficient analysis of geo-distributed datasets, cloud providers implement data-parallel jobs across geo-distributed sites (e.g., datacenters and edge clusters), which are generally interconnected by wide-area network links. However, current state-of-the-art geo-distributed data analytic methods fail to make full use of the available network and computing resources. The main reason is that such geo-distributed methods must wait for bottleneck sites to complete the corresponding transmission and computation in each phase. Furthermore, such geo-distributed methods may be impractical to the network bandwidth dynamicity and diverse job parallelism. To this end, we propose a Simultaneous Data Transfer and Processing (SDTP) mechanism to accelerate wide-area data analytics, with the joint consideration of network bandwidth dynamics and job parallelism. In the SDTP, a site can execute the computation, provided that it obtains the required input data. As a result, the input data loading, map, shuffle, and reduce phases at each site need not wait for the completion of the previous phases of other sites. We further improve the SDTP method by offering more accurate time estimation and generalizing the mechanism to dynamic situations. The trace-driven results demonstrate that SDTP can improve the wide-area analytic job response time by 19% to 72% compared to other methods.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TCC.2021.3119991</doi><tpages>16</tpages><orcidid>https://orcid.org/0000-0002-3472-1717</orcidid><orcidid>https://orcid.org/0000-0002-7999-4532</orcidid><orcidid>https://orcid.org/0000-0003-4894-5540</orcidid><orcidid>https://orcid.org/0000-0002-4886-9974</orcidid><orcidid>https://orcid.org/0000-0002-4064-1238</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2168-7161
ispartof IEEE transactions on cloud computing, 2023-01, Vol.11 (1), p.911-926
issn 2168-7161
2372-0018
language eng
recordid cdi_proquest_journals_2784554609
source IEEE Electronic Library (IEL) Journals
subjects Bandwidth
Bandwidths
Computation
Data analysis
Data centers
Data transfer (computers)
dynamic network
job parallelism
job response time
Mathematical analysis
Parallel processing
Silicon
Task analysis
task scheduling
Time factors
Wide area networks
Wide-area data analytics
title SDTP: Accelerating Wide-Area Data Analytics With Simultaneous Data Transfer and Processing
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T03%3A28%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=SDTP:%20Accelerating%20Wide-Area%20Data%20Analytics%20With%20Simultaneous%20Data%20Transfer%20and%20Processing&rft.jtitle=IEEE%20transactions%20on%20cloud%20computing&rft.au=Chen,%20Yiting&rft.date=2023-01&rft.volume=11&rft.issue=1&rft.spage=911&rft.epage=926&rft.pages=911-926&rft.issn=2168-7161&rft.eissn=2372-0018&rft.coden=ITCCF6&rft_id=info:doi/10.1109/TCC.2021.3119991&rft_dat=%3Cproquest_ieee_%3E2784554609%3C/proquest_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c333t-c53784b18d16edcdac05d9e3209b97bf31e66dd34e5cf66a70ae0faa38081aeb3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2784554609&rft_id=info:pmid/&rft_ieee_id=9574634&rfr_iscdi=true