Loading…

Efficient Identification of Critical Faults in Memristor-Based Inferencing Accelerators

Deep neural networks (DNNs) are becoming ubiquitous, but hardware-level reliability is a concern when DNN models are mapped to emerging neuromorphic technologies such as memristor-based crossbars. As DNN architectures are inherently fault tolerant and many faults do not affect inferencing accuracy,...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on computer-aided design of integrated circuits and systems 2022-07, Vol.41 (7), p.2301-2314
Main Authors: Chen, Ching-Yuan, Chakrabarty, Krishnendu
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Deep neural networks (DNNs) are becoming ubiquitous, but hardware-level reliability is a concern when DNN models are mapped to emerging neuromorphic technologies such as memristor-based crossbars. As DNN architectures are inherently fault tolerant and many faults do not affect inferencing accuracy, careful analysis must be carried out to identify faults that are critical for a given application. We present a misclassification-driven training (MDT) algorithm to efficiently identify critical faults (FCFs) in the crossbar. Our results for three DNNs on the CIFAR-10 data set show that MDT can rapidly and accurately identify a large number of FCFs-up to 20\times faster than a baseline method of forward inferencing with randomly injected faults. We use the set of FCFs obtained using MDT and the set of benign faults obtained using forward inferencing to train a machine learning (ML) model to efficiently classify all the crossbar faults in terms of their criticality. Using the ground truth generated using MDT and forward inferencing, we show that the ML models can classify millions of faults within minutes with a remarkably high classification accuracy of up to 99%. We also show that the ML model trained using CIFAR-10 provides high accuracy when it is used to carry out fault classification for the ImageNet data set. We present a fault-tolerance solution that exploits this high degree of criticality-classification accuracy, leading to a 92.5% reduction in the redundancy needed for fault tolerance.
ISSN:0278-0070
1937-4151
DOI:10.1109/TCAD.2021.3102894