Loading…
Optimizing Geo-Distributed Data Analytics with Coordinated Task Scheduling and Routing
Recent trends show that cloud computing is growing to span more and more globally distributed datacenters. For geo-distributed datacenters, there is an increasingly need for scheduling algorithms to place tasks across datacenters, by jointly considering WAN traffic and computation. This scheduling m...
Saved in:
Published in: | IEEE transactions on parallel and distributed systems 2020-02, Vol.31 (2), p.279-293 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Recent trends show that cloud computing is growing to span more and more globally distributed datacenters. For geo-distributed datacenters, there is an increasingly need for scheduling algorithms to place tasks across datacenters, by jointly considering WAN traffic and computation. This scheduling must deal with situations such as wide-area distributed data, data sharing, WAN bandwidth costs and datacenter capacity limits, while also minimizing makespan. However, this scheduling problem is NP-hard. We propose a new resource allocation algorithm called HPS+, an extension to Hypergraph Partition-based Scheduling. HPS+ models the combined task-data dependencies and data-datacenter dependencies as an augmented hypergraph, and adopts an improved hypergraph partition technique to minimize WAN traffic. It further uses a coordination mechanism to allocate network resources closely following the guidelines of task requirements, for minimizing the makespan. Evaluation across the real China-Astronomy-Cloud model and Google datacenter model show that HPS+ saves the amount of data transfers by upto 53 percent and reduces the makespan by 39 percent compared to existing algorithms. |
---|---|
ISSN: | 1045-9219 1558-2183 |
DOI: | 10.1109/TPDS.2019.2938164 |