Loading…
Enhancing Reliability via Checkpointing in Cloud Computing Systems
Cloud computing is becoming an important solution for providing scalable computing resources via Internet. Because there are tens of thousands of nodes in data center, the probability of server failures is nontrivial. Therefore, it is a critical challenge to guarantee the service reliability. Fault-...
Saved in:
Published in: | China communications 2017-07, Vol.14 (7), p.108-117 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Cloud computing is becoming an important solution for providing scalable computing resources via Internet. Because there are tens of thousands of nodes in data center, the probability of server failures is nontrivial. Therefore, it is a critical challenge to guarantee the service reliability. Fault-tolerance strategies, such as checkpoint, are commonly employed. Because of the failure of the edge switches, the checkpoint image may become inaccessible. Therefore, current checkpoint-based fault tolerance method cannot achieve the best effect. In this paper, we propose an optimal checkpoint method with edge switch failure-aware. The edge switch failure-aware checkpoint method includes two algorithms. The first algorithm employs the data center topology and communication characteristic for checkpoint image storage server selection. The second algorithm employs the checkpoint image storage characteristic as well as the data center topology to select the recovery server. Simulation experiments are performed to demonstrate the effectiveness of the proposed method. |
---|---|
ISSN: | 1673-5447 |
DOI: | 10.1109/CC.2017.8010962 |