Loading…

Toward Functional Safety of Systolic Array-Based Deep Learning Hardware Accelerators

High accuracy and ever-increasing computing power have made deep neural networks (DNNs) the algorithm of choice for various machine learning, computer vision, and image processing applications across the computing spectrum. To this end, Google developed the tensor processing unit (TPU) to accelerate...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on very large scale integration (VLSI) systems 2021-03, Vol.29 (3), p.485-498
Main Authors: Kundu, Shamik, Banerjee, Suvadeep, Raha, Arnab, Natarajan, Suriyaprakash, Basu, Kanad
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:High accuracy and ever-increasing computing power have made deep neural networks (DNNs) the algorithm of choice for various machine learning, computer vision, and image processing applications across the computing spectrum. To this end, Google developed the tensor processing unit (TPU) to accelerate the computationally intensive matrix multiplication operation of a DNN on its systolic array architecture. Faults manifested in the datapath of such a systolic array due to latent manufacturing defects or single-event effects may lead to functional safety (FuSa) violation. Although DNNs are known to resist minor perturbations with their inherent fault-tolerant characteristics, we show that the classification accuracy of the model plummets from 97.4% to 7.75% with a minimal fault rate of 0.0003% in the accelerator, implying catastrophic circumstances when deployed across mission-critical systems. Hence, to ensure FuSa of such accelerators, this article provides an extensive FuSa assessment of the accelerator exposed to faults in the datapath, by varying the network parameters, position, and characteristics of the induced error across multiple exhaustive data sets. Furthermore, we propose two novel strategies to obtain a diminutive set of functional test patterns to detect FuSa violation in a DNN accelerator. Our experimental results demonstrate that the obtained test sets can achieve an average of 92.63% (in some cases, up to 100%) fault coverage with cardinality as low as 0.1% of the entire test data set.
ISSN:1063-8210
1557-9999
DOI:10.1109/TVLSI.2020.3048829