Loading…

Dynamic node management and measure estimation in a state-driven fault injector

Validation of distributed systems using fault injection is difficult because of their inherent complexity, lack of a global clock, and lack of an easily accessible notion of a global state. To address these challenges, the Loki fault injector injects faults based on a partial view of the global stat...

Full description

Saved in:
Bibliographic Details
Main Authors: Chandra, R., Cukier, M., Lefever, R.M., Sanders, W.H.
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Validation of distributed systems using fault injection is difficult because of their inherent complexity, lack of a global clock, and lack of an easily accessible notion of a global state. To address these challenges, the Loki fault injector injects faults based on a partial view of the global state of a distributed system, and performs a post-runtime analysis using an off-line clock synchronization algorithm to determine whether the faults were properly injected. In this paper, we first describe an enhanced runtime architecture for the Loki fault injector and then present a new method for obtaining measures in Loki. The enhanced runtime allows dynamic entry and exit of nodes in the system. It also offers more efficient multicast of notification messages and more efficient communication between state machines on the same host, and is more scalable than the previous runtime. We then detail a new and flexible method for obtaining a wide range of performance and dependability measures in Loki.
ISSN:1060-9857
DOI:10.1109/RELDI.2000.885412