Loading…

Optimizing Geo-Distributed Data Analytics with Coordinated Task Scheduling and Routing

Recent trends show that cloud computing is growing to span more and more globally distributed datacenters. For geo-distributed datacenters, there is an increasingly need for scheduling algorithms to place tasks across datacenters, by jointly considering WAN traffic and computation. This scheduling m...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on parallel and distributed systems 2020-02, Vol.31 (2), p.279-293
Main Authors: Zhao, Laiping, Yang, Yanan, Munir, Ali, Liu, Alex X., Li, Yue, Qu, Wenyu
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c293t-4941b8222bef3090d70825bc3d026cfdbffc84cfaed93b2b2124c565e1126ba13
cites cdi_FETCH-LOGICAL-c293t-4941b8222bef3090d70825bc3d026cfdbffc84cfaed93b2b2124c565e1126ba13
container_end_page 293
container_issue 2
container_start_page 279
container_title IEEE transactions on parallel and distributed systems
container_volume 31
creator Zhao, Laiping
Yang, Yanan
Munir, Ali
Liu, Alex X.
Li, Yue
Qu, Wenyu
description Recent trends show that cloud computing is growing to span more and more globally distributed datacenters. For geo-distributed datacenters, there is an increasingly need for scheduling algorithms to place tasks across datacenters, by jointly considering WAN traffic and computation. This scheduling must deal with situations such as wide-area distributed data, data sharing, WAN bandwidth costs and datacenter capacity limits, while also minimizing makespan. However, this scheduling problem is NP-hard. We propose a new resource allocation algorithm called HPS+, an extension to Hypergraph Partition-based Scheduling. HPS+ models the combined task-data dependencies and data-datacenter dependencies as an augmented hypergraph, and adopts an improved hypergraph partition technique to minimize WAN traffic. It further uses a coordination mechanism to allocate network resources closely following the guidelines of task requirements, for minimizing the makespan. Evaluation across the real China-Astronomy-Cloud model and Google datacenter model show that HPS+ saves the amount of data transfers by upto 53 percent and reduces the makespan by 39 percent compared to existing algorithms.
doi_str_mv 10.1109/TPDS.2019.2938164
format article
fullrecord <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_ieee_primary_8818672</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8818672</ieee_id><sourcerecordid>2338627561</sourcerecordid><originalsourceid>FETCH-LOGICAL-c293t-4941b8222bef3090d70825bc3d026cfdbffc84cfaed93b2b2124c565e1126ba13</originalsourceid><addsrcrecordid>eNo9kE9LAzEUxIMoWKsfQLwseN6alz_b5FharUKhYqvXkGSzNrXt1iSL1E_vLi2e3sCbGYYfQreABwBYPixfJ4sBwSAHRFIBBTtDPeBc5AQEPW81ZjyXBOQluopxjTEwjlkPfcz3yW_9r999ZlNX5xMfU_CmSa7MJjrpbLTTm0PyNmY_Pq2ycV2H0u9091_q-JUt7MqVzabL612ZvdVNavU1uqj0Jrqb0-2j96fH5fg5n82nL-PRLLftzJQzycAIQohxFcUSl0MsCDeWlpgUtipNVVnBbKVdKakhhgBhlhfcAZDCaKB9dH_s3Yf6u3ExqXXdhHZyVIRSUZAhLzoXHF021DEGV6l98FsdDgqw6vCpDp_q8KkTvjZzd8x459y_XwgQxZDQP1eFbBI</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2338627561</pqid></control><display><type>article</type><title>Optimizing Geo-Distributed Data Analytics with Coordinated Task Scheduling and Routing</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Zhao, Laiping ; Yang, Yanan ; Munir, Ali ; Liu, Alex X. ; Li, Yue ; Qu, Wenyu</creator><creatorcontrib>Zhao, Laiping ; Yang, Yanan ; Munir, Ali ; Liu, Alex X. ; Li, Yue ; Qu, Wenyu</creatorcontrib><description>Recent trends show that cloud computing is growing to span more and more globally distributed datacenters. For geo-distributed datacenters, there is an increasingly need for scheduling algorithms to place tasks across datacenters, by jointly considering WAN traffic and computation. This scheduling must deal with situations such as wide-area distributed data, data sharing, WAN bandwidth costs and datacenter capacity limits, while also minimizing makespan. However, this scheduling problem is NP-hard. We propose a new resource allocation algorithm called HPS+, an extension to Hypergraph Partition-based Scheduling. HPS+ models the combined task-data dependencies and data-datacenter dependencies as an augmented hypergraph, and adopts an improved hypergraph partition technique to minimize WAN traffic. It further uses a coordination mechanism to allocate network resources closely following the guidelines of task requirements, for minimizing the makespan. Evaluation across the real China-Astronomy-Cloud model and Google datacenter model show that HPS+ saves the amount of data transfers by upto 53 percent and reduces the makespan by 39 percent compared to existing algorithms.</description><identifier>ISSN: 1045-9219</identifier><identifier>EISSN: 1558-2183</identifier><identifier>DOI: 10.1109/TPDS.2019.2938164</identifier><identifier>CODEN: ITDSEO</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Algorithms ; Astronomy ; Bandwidth ; Cloud computing ; Data Analytics ; Data centers ; Data retrieval ; Data transfer ; Geo-distributed Cloud ; Graph theory ; Graphs ; Partitions ; Processor scheduling ; Production scheduling ; Resource allocation ; Routing ; Scheduling ; Task analysis ; Task Scheduling ; Wide area networks</subject><ispartof>IEEE transactions on parallel and distributed systems, 2020-02, Vol.31 (2), p.279-293</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c293t-4941b8222bef3090d70825bc3d026cfdbffc84cfaed93b2b2124c565e1126ba13</citedby><cites>FETCH-LOGICAL-c293t-4941b8222bef3090d70825bc3d026cfdbffc84cfaed93b2b2124c565e1126ba13</cites><orcidid>0000-0003-1967-2192 ; 0000-0002-2222-393X ; 0000-0002-6916-1326 ; 0000-0003-4817-5187 ; 0000-0001-5148-4306</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8818672$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,54796</link.rule.ids></links><search><creatorcontrib>Zhao, Laiping</creatorcontrib><creatorcontrib>Yang, Yanan</creatorcontrib><creatorcontrib>Munir, Ali</creatorcontrib><creatorcontrib>Liu, Alex X.</creatorcontrib><creatorcontrib>Li, Yue</creatorcontrib><creatorcontrib>Qu, Wenyu</creatorcontrib><title>Optimizing Geo-Distributed Data Analytics with Coordinated Task Scheduling and Routing</title><title>IEEE transactions on parallel and distributed systems</title><addtitle>TPDS</addtitle><description>Recent trends show that cloud computing is growing to span more and more globally distributed datacenters. For geo-distributed datacenters, there is an increasingly need for scheduling algorithms to place tasks across datacenters, by jointly considering WAN traffic and computation. This scheduling must deal with situations such as wide-area distributed data, data sharing, WAN bandwidth costs and datacenter capacity limits, while also minimizing makespan. However, this scheduling problem is NP-hard. We propose a new resource allocation algorithm called HPS+, an extension to Hypergraph Partition-based Scheduling. HPS+ models the combined task-data dependencies and data-datacenter dependencies as an augmented hypergraph, and adopts an improved hypergraph partition technique to minimize WAN traffic. It further uses a coordination mechanism to allocate network resources closely following the guidelines of task requirements, for minimizing the makespan. Evaluation across the real China-Astronomy-Cloud model and Google datacenter model show that HPS+ saves the amount of data transfers by upto 53 percent and reduces the makespan by 39 percent compared to existing algorithms.</description><subject>Algorithms</subject><subject>Astronomy</subject><subject>Bandwidth</subject><subject>Cloud computing</subject><subject>Data Analytics</subject><subject>Data centers</subject><subject>Data retrieval</subject><subject>Data transfer</subject><subject>Geo-distributed Cloud</subject><subject>Graph theory</subject><subject>Graphs</subject><subject>Partitions</subject><subject>Processor scheduling</subject><subject>Production scheduling</subject><subject>Resource allocation</subject><subject>Routing</subject><subject>Scheduling</subject><subject>Task analysis</subject><subject>Task Scheduling</subject><subject>Wide area networks</subject><issn>1045-9219</issn><issn>1558-2183</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><recordid>eNo9kE9LAzEUxIMoWKsfQLwseN6alz_b5FharUKhYqvXkGSzNrXt1iSL1E_vLi2e3sCbGYYfQreABwBYPixfJ4sBwSAHRFIBBTtDPeBc5AQEPW81ZjyXBOQluopxjTEwjlkPfcz3yW_9r999ZlNX5xMfU_CmSa7MJjrpbLTTm0PyNmY_Pq2ycV2H0u9091_q-JUt7MqVzabL612ZvdVNavU1uqj0Jrqb0-2j96fH5fg5n82nL-PRLLftzJQzycAIQohxFcUSl0MsCDeWlpgUtipNVVnBbKVdKakhhgBhlhfcAZDCaKB9dH_s3Yf6u3ExqXXdhHZyVIRSUZAhLzoXHF021DEGV6l98FsdDgqw6vCpDp_q8KkTvjZzd8x459y_XwgQxZDQP1eFbBI</recordid><startdate>20200201</startdate><enddate>20200201</enddate><creator>Zhao, Laiping</creator><creator>Yang, Yanan</creator><creator>Munir, Ali</creator><creator>Liu, Alex X.</creator><creator>Li, Yue</creator><creator>Qu, Wenyu</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-1967-2192</orcidid><orcidid>https://orcid.org/0000-0002-2222-393X</orcidid><orcidid>https://orcid.org/0000-0002-6916-1326</orcidid><orcidid>https://orcid.org/0000-0003-4817-5187</orcidid><orcidid>https://orcid.org/0000-0001-5148-4306</orcidid></search><sort><creationdate>20200201</creationdate><title>Optimizing Geo-Distributed Data Analytics with Coordinated Task Scheduling and Routing</title><author>Zhao, Laiping ; Yang, Yanan ; Munir, Ali ; Liu, Alex X. ; Li, Yue ; Qu, Wenyu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c293t-4941b8222bef3090d70825bc3d026cfdbffc84cfaed93b2b2124c565e1126ba13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Algorithms</topic><topic>Astronomy</topic><topic>Bandwidth</topic><topic>Cloud computing</topic><topic>Data Analytics</topic><topic>Data centers</topic><topic>Data retrieval</topic><topic>Data transfer</topic><topic>Geo-distributed Cloud</topic><topic>Graph theory</topic><topic>Graphs</topic><topic>Partitions</topic><topic>Processor scheduling</topic><topic>Production scheduling</topic><topic>Resource allocation</topic><topic>Routing</topic><topic>Scheduling</topic><topic>Task analysis</topic><topic>Task Scheduling</topic><topic>Wide area networks</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhao, Laiping</creatorcontrib><creatorcontrib>Yang, Yanan</creatorcontrib><creatorcontrib>Munir, Ali</creatorcontrib><creatorcontrib>Liu, Alex X.</creatorcontrib><creatorcontrib>Li, Yue</creatorcontrib><creatorcontrib>Qu, Wenyu</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Xplore</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on parallel and distributed systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhao, Laiping</au><au>Yang, Yanan</au><au>Munir, Ali</au><au>Liu, Alex X.</au><au>Li, Yue</au><au>Qu, Wenyu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Optimizing Geo-Distributed Data Analytics with Coordinated Task Scheduling and Routing</atitle><jtitle>IEEE transactions on parallel and distributed systems</jtitle><stitle>TPDS</stitle><date>2020-02-01</date><risdate>2020</risdate><volume>31</volume><issue>2</issue><spage>279</spage><epage>293</epage><pages>279-293</pages><issn>1045-9219</issn><eissn>1558-2183</eissn><coden>ITDSEO</coden><abstract>Recent trends show that cloud computing is growing to span more and more globally distributed datacenters. For geo-distributed datacenters, there is an increasingly need for scheduling algorithms to place tasks across datacenters, by jointly considering WAN traffic and computation. This scheduling must deal with situations such as wide-area distributed data, data sharing, WAN bandwidth costs and datacenter capacity limits, while also minimizing makespan. However, this scheduling problem is NP-hard. We propose a new resource allocation algorithm called HPS+, an extension to Hypergraph Partition-based Scheduling. HPS+ models the combined task-data dependencies and data-datacenter dependencies as an augmented hypergraph, and adopts an improved hypergraph partition technique to minimize WAN traffic. It further uses a coordination mechanism to allocate network resources closely following the guidelines of task requirements, for minimizing the makespan. Evaluation across the real China-Astronomy-Cloud model and Google datacenter model show that HPS+ saves the amount of data transfers by upto 53 percent and reduces the makespan by 39 percent compared to existing algorithms.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TPDS.2019.2938164</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0003-1967-2192</orcidid><orcidid>https://orcid.org/0000-0002-2222-393X</orcidid><orcidid>https://orcid.org/0000-0002-6916-1326</orcidid><orcidid>https://orcid.org/0000-0003-4817-5187</orcidid><orcidid>https://orcid.org/0000-0001-5148-4306</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1045-9219
ispartof IEEE transactions on parallel and distributed systems, 2020-02, Vol.31 (2), p.279-293
issn 1045-9219
1558-2183
language eng
recordid cdi_ieee_primary_8818672
source IEEE Electronic Library (IEL) Journals
subjects Algorithms
Astronomy
Bandwidth
Cloud computing
Data Analytics
Data centers
Data retrieval
Data transfer
Geo-distributed Cloud
Graph theory
Graphs
Partitions
Processor scheduling
Production scheduling
Resource allocation
Routing
Scheduling
Task analysis
Task Scheduling
Wide area networks
title Optimizing Geo-Distributed Data Analytics with Coordinated Task Scheduling and Routing
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T23%3A21%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Optimizing%20Geo-Distributed%20Data%20Analytics%20with%20Coordinated%20Task%20Scheduling%20and%20Routing&rft.jtitle=IEEE%20transactions%20on%20parallel%20and%20distributed%20systems&rft.au=Zhao,%20Laiping&rft.date=2020-02-01&rft.volume=31&rft.issue=2&rft.spage=279&rft.epage=293&rft.pages=279-293&rft.issn=1045-9219&rft.eissn=1558-2183&rft.coden=ITDSEO&rft_id=info:doi/10.1109/TPDS.2019.2938164&rft_dat=%3Cproquest_ieee_%3E2338627561%3C/proquest_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c293t-4941b8222bef3090d70825bc3d026cfdbffc84cfaed93b2b2124c565e1126ba13%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2338627561&rft_id=info:pmid/&rft_ieee_id=8818672&rfr_iscdi=true