Loading…
Replay for debugging MPI parallel programs
The cyclic debugging approach often fails for parallel programs because parallel programs reveal nondeterministic characteristics due to message race conditions. This paper addresses the execution replay algorithm for debugging MPI parallel programs. The lexical analyzer identifies the MPI events wh...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The cyclic debugging approach often fails for parallel programs because parallel programs reveal nondeterministic characteristics due to message race conditions. This paper addresses the execution replay algorithm for debugging MPI parallel programs. The lexical analyzer identifies the MPI events which affect nondeterministic executions, and then an execution is controlled in order to make it equivalent to a reference execution by keeping their orders of events in two executions identical. The proposed replay system uses the logical time stamping algorithm and the derived data types provided by MPI standard. It also presents the method of how to replay the blocking and nonblocking message passing events. The proposed replay system was applied to the bitonic-merge sort and other parallel programs. We found that re-execution has reproducible behavior and the replay system is useful to find the communication errors. |
---|---|
DOI: | 10.1109/MPIDC.1996.534108 |