Loading…
Parallelizing Checkpoint for Faster Fault Tolerance
Modern systems are prone to error, which calls for fault tolerance mechanisms. Traditional fault tolerance mechanisms (checkpoint mechanism) introduce large overhead, sometimes unacceptable. This paper introduces parallel checkpoint, a high efficient checkpoint mechanism for fault tolerance for mult...
Saved in:
Published in: | Journal of physics. Conference series 2018-08, Vol.1069 (1), p.12061 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Modern systems are prone to error, which calls for fault tolerance mechanisms. Traditional fault tolerance mechanisms (checkpoint mechanism) introduce large overhead, sometimes unacceptable. This paper introduces parallel checkpoint, a high efficient checkpoint mechanism for fault tolerance for multi-threaded programs. By eliminating global barrier, parallelizing threads' checkpoint phase, and overlapping threads' computing phase and checkpoint phase, we can achieve great performance gain (averagely 3.16x) and much better scalability over previous checkpoint mechanism. |
---|---|
ISSN: | 1742-6588 1742-6596 |
DOI: | 10.1088/1742-6596/1069/1/012061 |