Loading…
Improving Selective Fault Tolerance in GPU Register Files by Relaxing Application Accuracy
The high computing power of graphics processing units (GPUs) makes them attractive for safety-critical applications, where reliability is a major concern. This article uses an approximate computing perspective to relax application accuracy in order to improve the selective fault tolerance techniques...
Saved in:
Published in: | IEEE transactions on nuclear science 2020-07, Vol.67 (7), p.1573-1580 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The high computing power of graphics processing units (GPUs) makes them attractive for safety-critical applications, where reliability is a major concern. This article uses an approximate computing perspective to relax application accuracy in order to improve the selective fault tolerance techniques. Our approach first assesses the vulnerability of a Kepler GPU to the transient effects through a neutron beam experiment. Then, it performs a fault injection campaign to identify the most critical registers and relax the result accuracy. Finally, it uses the acquired data to improve the selective fault tolerance techniques in terms of occupation and performance. The results show that it was possible to improve the GPU register file's reliability on average by 71.6% by relaxing the application accuracy and, when compared with the selective hardening techniques, it was able to reduce the replicated registers by an average of 41.4%, while maintaining 100% fault coverage. |
---|---|
ISSN: | 0018-9499 1558-1578 |
DOI: | 10.1109/TNS.2020.2982162 |