Loading…
SDTP: Accelerating Wide-Area Data Analytics With Simultaneous Data Transfer and Processing
For the efficient analysis of geo-distributed datasets, cloud providers implement data-parallel jobs across geo-distributed sites (e.g., datacenters and edge clusters), which are generally interconnected by wide-area network links. However, current state-of-the-art geo-distributed data analytic meth...
Saved in:
Published in: | IEEE transactions on cloud computing 2023-01, Vol.11 (1), p.911-926 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c333t-c53784b18d16edcdac05d9e3209b97bf31e66dd34e5cf66a70ae0faa38081aeb3 |
---|---|
cites | cdi_FETCH-LOGICAL-c333t-c53784b18d16edcdac05d9e3209b97bf31e66dd34e5cf66a70ae0faa38081aeb3 |
container_end_page | 926 |
container_issue | 1 |
container_start_page | 911 |
container_title | IEEE transactions on cloud computing |
container_volume | 11 |
creator | Chen, Yiting Luo, Lailong Guo, Deke Rottenstreich, Ori Wu, Jie |
description | For the efficient analysis of geo-distributed datasets, cloud providers implement data-parallel jobs across geo-distributed sites (e.g., datacenters and edge clusters), which are generally interconnected by wide-area network links. However, current state-of-the-art geo-distributed data analytic methods fail to make full use of the available network and computing resources. The main reason is that such geo-distributed methods must wait for bottleneck sites to complete the corresponding transmission and computation in each phase. Furthermore, such geo-distributed methods may be impractical to the network bandwidth dynamicity and diverse job parallelism. To this end, we propose a Simultaneous Data Transfer and Processing (SDTP) mechanism to accelerate wide-area data analytics, with the joint consideration of network bandwidth dynamics and job parallelism. In the SDTP, a site can execute the computation, provided that it obtains the required input data. As a result, the input data loading, map, shuffle, and reduce phases at each site need not wait for the completion of the previous phases of other sites. We further improve the SDTP method by offering more accurate time estimation and generalizing the mechanism to dynamic situations. The trace-driven results demonstrate that SDTP can improve the wide-area analytic job response time by 19% to 72% compared to other methods. |
doi_str_mv | 10.1109/TCC.2021.3119991 |
format | article |
fullrecord | <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_proquest_journals_2784554609</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9574634</ieee_id><sourcerecordid>2784554609</sourcerecordid><originalsourceid>FETCH-LOGICAL-c333t-c53784b18d16edcdac05d9e3209b97bf31e66dd34e5cf66a70ae0faa38081aeb3</originalsourceid><addsrcrecordid>eNo9kE1rwzAMhs3YYKXrfbBLYOd0Vpw48W4h3RcUVmjGYBfj2MqWkiadnRz67-eSMl0k0KMX8RByC3QJQMVDWRTLiEawZABCCLggs4ilUUgpZJd-Bp6FKXC4JgvndtRXloAAMSNf21W5eQxyrbFFq4am-w4-G4NhblEFKzWoIO9Uexwa7fxi-Am2zX5sB9VhP7oJKK3qXI02UJ0JNrbX6JzPuSFXtWodLs59Tj6en8riNVy_v7wV-TrUjLEh1AlLs7iCzABHo43SNDECWURFJdKqZoCcG8NiTHTNuUqpQlorxTKagcKKzcn9lHuw_e-IbpC7frT-aScjn5wkMafCU3SitO2ds1jLg232yh4lUHmSKL1EeZIozxL9yd100iDiPy6SNOYsZn9JNmzv</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2784554609</pqid></control><display><type>article</type><title>SDTP: Accelerating Wide-Area Data Analytics With Simultaneous Data Transfer and Processing</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Chen, Yiting ; Luo, Lailong ; Guo, Deke ; Rottenstreich, Ori ; Wu, Jie</creator><creatorcontrib>Chen, Yiting ; Luo, Lailong ; Guo, Deke ; Rottenstreich, Ori ; Wu, Jie</creatorcontrib><description>For the efficient analysis of geo-distributed datasets, cloud providers implement data-parallel jobs across geo-distributed sites (e.g., datacenters and edge clusters), which are generally interconnected by wide-area network links. However, current state-of-the-art geo-distributed data analytic methods fail to make full use of the available network and computing resources. The main reason is that such geo-distributed methods must wait for bottleneck sites to complete the corresponding transmission and computation in each phase. Furthermore, such geo-distributed methods may be impractical to the network bandwidth dynamicity and diverse job parallelism. To this end, we propose a Simultaneous Data Transfer and Processing (SDTP) mechanism to accelerate wide-area data analytics, with the joint consideration of network bandwidth dynamics and job parallelism. In the SDTP, a site can execute the computation, provided that it obtains the required input data. As a result, the input data loading, map, shuffle, and reduce phases at each site need not wait for the completion of the previous phases of other sites. We further improve the SDTP method by offering more accurate time estimation and generalizing the mechanism to dynamic situations. The trace-driven results demonstrate that SDTP can improve the wide-area analytic job response time by 19% to 72% compared to other methods.</description><identifier>ISSN: 2168-7161</identifier><identifier>EISSN: 2372-0018</identifier><identifier>DOI: 10.1109/TCC.2021.3119991</identifier><identifier>CODEN: ITCCF6</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Bandwidth ; Bandwidths ; Computation ; Data analysis ; Data centers ; Data transfer (computers) ; dynamic network ; job parallelism ; job response time ; Mathematical analysis ; Parallel processing ; Silicon ; Task analysis ; task scheduling ; Time factors ; Wide area networks ; Wide-area data analytics</subject><ispartof>IEEE transactions on cloud computing, 2023-01, Vol.11 (1), p.911-926</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c333t-c53784b18d16edcdac05d9e3209b97bf31e66dd34e5cf66a70ae0faa38081aeb3</citedby><cites>FETCH-LOGICAL-c333t-c53784b18d16edcdac05d9e3209b97bf31e66dd34e5cf66a70ae0faa38081aeb3</cites><orcidid>0000-0002-3472-1717 ; 0000-0002-7999-4532 ; 0000-0003-4894-5540 ; 0000-0002-4886-9974 ; 0000-0002-4064-1238</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9574634$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,27901,27902,54771</link.rule.ids></links><search><creatorcontrib>Chen, Yiting</creatorcontrib><creatorcontrib>Luo, Lailong</creatorcontrib><creatorcontrib>Guo, Deke</creatorcontrib><creatorcontrib>Rottenstreich, Ori</creatorcontrib><creatorcontrib>Wu, Jie</creatorcontrib><title>SDTP: Accelerating Wide-Area Data Analytics With Simultaneous Data Transfer and Processing</title><title>IEEE transactions on cloud computing</title><addtitle>TCC</addtitle><description>For the efficient analysis of geo-distributed datasets, cloud providers implement data-parallel jobs across geo-distributed sites (e.g., datacenters and edge clusters), which are generally interconnected by wide-area network links. However, current state-of-the-art geo-distributed data analytic methods fail to make full use of the available network and computing resources. The main reason is that such geo-distributed methods must wait for bottleneck sites to complete the corresponding transmission and computation in each phase. Furthermore, such geo-distributed methods may be impractical to the network bandwidth dynamicity and diverse job parallelism. To this end, we propose a Simultaneous Data Transfer and Processing (SDTP) mechanism to accelerate wide-area data analytics, with the joint consideration of network bandwidth dynamics and job parallelism. In the SDTP, a site can execute the computation, provided that it obtains the required input data. As a result, the input data loading, map, shuffle, and reduce phases at each site need not wait for the completion of the previous phases of other sites. We further improve the SDTP method by offering more accurate time estimation and generalizing the mechanism to dynamic situations. The trace-driven results demonstrate that SDTP can improve the wide-area analytic job response time by 19% to 72% compared to other methods.</description><subject>Bandwidth</subject><subject>Bandwidths</subject><subject>Computation</subject><subject>Data analysis</subject><subject>Data centers</subject><subject>Data transfer (computers)</subject><subject>dynamic network</subject><subject>job parallelism</subject><subject>job response time</subject><subject>Mathematical analysis</subject><subject>Parallel processing</subject><subject>Silicon</subject><subject>Task analysis</subject><subject>task scheduling</subject><subject>Time factors</subject><subject>Wide area networks</subject><subject>Wide-area data analytics</subject><issn>2168-7161</issn><issn>2372-0018</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><recordid>eNo9kE1rwzAMhs3YYKXrfbBLYOd0Vpw48W4h3RcUVmjGYBfj2MqWkiadnRz67-eSMl0k0KMX8RByC3QJQMVDWRTLiEawZABCCLggs4ilUUgpZJd-Bp6FKXC4JgvndtRXloAAMSNf21W5eQxyrbFFq4am-w4-G4NhblEFKzWoIO9Uexwa7fxi-Am2zX5sB9VhP7oJKK3qXI02UJ0JNrbX6JzPuSFXtWodLs59Tj6en8riNVy_v7wV-TrUjLEh1AlLs7iCzABHo43SNDECWURFJdKqZoCcG8NiTHTNuUqpQlorxTKagcKKzcn9lHuw_e-IbpC7frT-aScjn5wkMafCU3SitO2ds1jLg232yh4lUHmSKL1EeZIozxL9yd100iDiPy6SNOYsZn9JNmzv</recordid><startdate>202301</startdate><enddate>202301</enddate><creator>Chen, Yiting</creator><creator>Luo, Lailong</creator><creator>Guo, Deke</creator><creator>Rottenstreich, Ori</creator><creator>Wu, Jie</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-3472-1717</orcidid><orcidid>https://orcid.org/0000-0002-7999-4532</orcidid><orcidid>https://orcid.org/0000-0003-4894-5540</orcidid><orcidid>https://orcid.org/0000-0002-4886-9974</orcidid><orcidid>https://orcid.org/0000-0002-4064-1238</orcidid></search><sort><creationdate>202301</creationdate><title>SDTP: Accelerating Wide-Area Data Analytics With Simultaneous Data Transfer and Processing</title><author>Chen, Yiting ; Luo, Lailong ; Guo, Deke ; Rottenstreich, Ori ; Wu, Jie</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c333t-c53784b18d16edcdac05d9e3209b97bf31e66dd34e5cf66a70ae0faa38081aeb3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Bandwidth</topic><topic>Bandwidths</topic><topic>Computation</topic><topic>Data analysis</topic><topic>Data centers</topic><topic>Data transfer (computers)</topic><topic>dynamic network</topic><topic>job parallelism</topic><topic>job response time</topic><topic>Mathematical analysis</topic><topic>Parallel processing</topic><topic>Silicon</topic><topic>Task analysis</topic><topic>task scheduling</topic><topic>Time factors</topic><topic>Wide area networks</topic><topic>Wide-area data analytics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chen, Yiting</creatorcontrib><creatorcontrib>Luo, Lailong</creatorcontrib><creatorcontrib>Guo, Deke</creatorcontrib><creatorcontrib>Rottenstreich, Ori</creatorcontrib><creatorcontrib>Wu, Jie</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005–Present</collection><collection>IEEE Xplore Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Xplore</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on cloud computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chen, Yiting</au><au>Luo, Lailong</au><au>Guo, Deke</au><au>Rottenstreich, Ori</au><au>Wu, Jie</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>SDTP: Accelerating Wide-Area Data Analytics With Simultaneous Data Transfer and Processing</atitle><jtitle>IEEE transactions on cloud computing</jtitle><stitle>TCC</stitle><date>2023-01</date><risdate>2023</risdate><volume>11</volume><issue>1</issue><spage>911</spage><epage>926</epage><pages>911-926</pages><issn>2168-7161</issn><eissn>2372-0018</eissn><coden>ITCCF6</coden><abstract>For the efficient analysis of geo-distributed datasets, cloud providers implement data-parallel jobs across geo-distributed sites (e.g., datacenters and edge clusters), which are generally interconnected by wide-area network links. However, current state-of-the-art geo-distributed data analytic methods fail to make full use of the available network and computing resources. The main reason is that such geo-distributed methods must wait for bottleneck sites to complete the corresponding transmission and computation in each phase. Furthermore, such geo-distributed methods may be impractical to the network bandwidth dynamicity and diverse job parallelism. To this end, we propose a Simultaneous Data Transfer and Processing (SDTP) mechanism to accelerate wide-area data analytics, with the joint consideration of network bandwidth dynamics and job parallelism. In the SDTP, a site can execute the computation, provided that it obtains the required input data. As a result, the input data loading, map, shuffle, and reduce phases at each site need not wait for the completion of the previous phases of other sites. We further improve the SDTP method by offering more accurate time estimation and generalizing the mechanism to dynamic situations. The trace-driven results demonstrate that SDTP can improve the wide-area analytic job response time by 19% to 72% compared to other methods.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TCC.2021.3119991</doi><tpages>16</tpages><orcidid>https://orcid.org/0000-0002-3472-1717</orcidid><orcidid>https://orcid.org/0000-0002-7999-4532</orcidid><orcidid>https://orcid.org/0000-0003-4894-5540</orcidid><orcidid>https://orcid.org/0000-0002-4886-9974</orcidid><orcidid>https://orcid.org/0000-0002-4064-1238</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2168-7161 |
ispartof | IEEE transactions on cloud computing, 2023-01, Vol.11 (1), p.911-926 |
issn | 2168-7161 2372-0018 |
language | eng |
recordid | cdi_proquest_journals_2784554609 |
source | IEEE Electronic Library (IEL) Journals |
subjects | Bandwidth Bandwidths Computation Data analysis Data centers Data transfer (computers) dynamic network job parallelism job response time Mathematical analysis Parallel processing Silicon Task analysis task scheduling Time factors Wide area networks Wide-area data analytics |
title | SDTP: Accelerating Wide-Area Data Analytics With Simultaneous Data Transfer and Processing |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T03%3A28%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=SDTP:%20Accelerating%20Wide-Area%20Data%20Analytics%20With%20Simultaneous%20Data%20Transfer%20and%20Processing&rft.jtitle=IEEE%20transactions%20on%20cloud%20computing&rft.au=Chen,%20Yiting&rft.date=2023-01&rft.volume=11&rft.issue=1&rft.spage=911&rft.epage=926&rft.pages=911-926&rft.issn=2168-7161&rft.eissn=2372-0018&rft.coden=ITCCF6&rft_id=info:doi/10.1109/TCC.2021.3119991&rft_dat=%3Cproquest_ieee_%3E2784554609%3C/proquest_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c333t-c53784b18d16edcdac05d9e3209b97bf31e66dd34e5cf66a70ae0faa38081aeb3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2784554609&rft_id=info:pmid/&rft_ieee_id=9574634&rfr_iscdi=true |