Loading…

Replay for debugging MPI parallel programs

The cyclic debugging approach often fails for parallel programs because parallel programs reveal nondeterministic characteristics due to message race conditions. This paper addresses the execution replay algorithm for debugging MPI parallel programs. The lexical analyzer identifies the MPI events wh...

Full description

Saved in:
Bibliographic Details
Main Authors: Chul-Eui Hong, Bum-Sik Lee, Gi-Won On, Dong-Hae Chi
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The cyclic debugging approach often fails for parallel programs because parallel programs reveal nondeterministic characteristics due to message race conditions. This paper addresses the execution replay algorithm for debugging MPI parallel programs. The lexical analyzer identifies the MPI events which affect nondeterministic executions, and then an execution is controlled in order to make it equivalent to a reference execution by keeping their orders of events in two executions identical. The proposed replay system uses the logical time stamping algorithm and the derived data types provided by MPI standard. It also presents the method of how to replay the blocking and nonblocking message passing events. The proposed replay system was applied to the bitonic-merge sort and other parallel programs. We found that re-execution has reproducible behavior and the replay system is useful to find the communication errors.
DOI:10.1109/MPIDC.1996.534108