Loading…

Enhancing Reliability via Checkpointing in Cloud Computing Systems

Cloud computing is becoming an important solution for providing scalable computing resources via Internet. Because there are tens of thousands of nodes in data center, the probability of server failures is nontrivial. Therefore, it is a critical challenge to guarantee the service reliability. Fault-...

Full description

Saved in:
Bibliographic Details
Published in:China communications 2017-07, Vol.14 (7), p.108-117
Main Authors: Zhou, Ao, Sun, Qibo, Li, Jinglin
Format: Article
Language:English
Subjects:
Citations: Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Cloud computing is becoming an important solution for providing scalable computing resources via Internet. Because there are tens of thousands of nodes in data center, the probability of server failures is nontrivial. Therefore, it is a critical challenge to guarantee the service reliability. Fault-tolerance strategies, such as checkpoint, are commonly employed. Because of the failure of the edge switches, the checkpoint image may become inaccessible. Therefore, current checkpoint-based fault tolerance method cannot achieve the best effect. In this paper, we propose an optimal checkpoint method with edge switch failure-aware. The edge switch failure-aware checkpoint method includes two algorithms. The first algorithm employs the data center topology and communication characteristic for checkpoint image storage server selection. The second algorithm employs the checkpoint image storage characteristic as well as the data center topology to select the recovery server. Simulation experiments are performed to demonstrate the effectiveness of the proposed method.
ISSN:1673-5447
DOI:10.1109/CC.2017.8010962