Loading…

Investigating the impact of transient hardware faults on deep learning neural network inference

Summary Safety‐critical applications, such as autonomous vehicles, healthcare, and space applications, have witnessed widespread deployment of deep neural networks (DNNs). Inherent algorithmic inaccuracies have consistently been a prevalent cause of misclassifications, even in modern DNNs. Simultane...

Full description

Saved in:
Bibliographic Details
Published in:Software testing, verification & reliability verification & reliability, 2024-06, Vol.34 (4), p.n/a
Main Authors: Rahman, Md Hasanur, Laskar, Sabuj, Li, Guanpeng
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c2923-f15767520ff9bb0ac2d167fb3cf304d11ad715874672b20a9b9135db77fbda523
container_end_page n/a
container_issue 4
container_start_page
container_title Software testing, verification & reliability
container_volume 34
creator Rahman, Md Hasanur
Laskar, Sabuj
Li, Guanpeng
description Summary Safety‐critical applications, such as autonomous vehicles, healthcare, and space applications, have witnessed widespread deployment of deep neural networks (DNNs). Inherent algorithmic inaccuracies have consistently been a prevalent cause of misclassifications, even in modern DNNs. Simultaneously, with an ongoing effort to minimize the footprint of contemporary chip design, there is a continual rise in the likelihood of transient hardware faults in deployed DNN models. Consequently, researchers have wondered the extent to which these faults contribute to DNN misclassifications compared to algorithmic inaccuracies. This article delves into the impact of DNN misclassifications caused by transient hardware faults and intrinsic algorithmic inaccuracies in safety‐critical applications. Initially, we enhance a cutting‐edge fault injector, TensorFI, for TensorFlow applications to facilitate fault injections on modern DNN non‐sequential models in a scalable manner. Subsequently, we analyse the DNN‐inferred outcomes based on our defined safety‐critical metrics. Finally, we conduct extensive fault injection experiments and a comprehensive analysis to achieve the following objectives: (1) investigate the impact of different target class groupings on DNN failures and (2) pinpoint the most vulnerable bit locations within tensors, as well as DNN layers accountable for the majority of safety‐critical misclassifications. Our findings regarding different grouping formations reveal that failures induced by transient hardware faults can have a substantially greater impact (with a probability up to 4 × higher) on safety‐critical applications compared to those resulting from algorithmic inaccuracies. Additionally, our investigation demonstrates that higher order bit positions in tensors, as well as initial and final layers of DNNs, necessitate prioritized protection compared to other regions. This study investigates the impact of transient hardware faults and algorithmic inaccuracies on DNN misclassifications in terms of safety‐critical behavior. Specifically, this study offers a more comprehensive understanding of the impact of multifaceted factors influencing the likelihood of safety‐critical misclassifications across different DNN models. Our thorough findings highlight that transient hardware faults pose a greater risk than intrinsic algorithmic inaccuracies to cause safety‐critical misclassifications.
doi_str_mv 10.1002/stvr.1873
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3053029142</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3053029142</sourcerecordid><originalsourceid>FETCH-LOGICAL-c2923-f15767520ff9bb0ac2d167fb3cf304d11ad715874672b20a9b9135db77fbda523</originalsourceid><addsrcrecordid>eNp1kD1PwzAURS0EEqUw8A8sMTGkfbYbOx5RxUelSkhQWC0nsduU1Am206r_noSyMt3l3Pv0DkK3BCYEgE5D3PsJyQQ7QyMCUiaEZ_IcjUBySCBj7BJdhbAFAC65HCG1cHsTYrXWsXJrHDcGV7tWFxE3FkevXaiMi3ijfXnQ3mCruzoG3DhcGtPi2mjvhqIzndd1H_HQ-C9cOWu8cYW5RhdW18Hc_OUYfTw9ruYvyfL1eTF_WCYFlZQllqSCi5SCtTLPQRe0JFzYnBWWwawkRJeCpJmYcUFzClrmkrC0zEXPlDqlbIzuTrutb767_iO1bTrv-pOKQcqASjIbqPsTVfgmBG-san210_6oCKjBnxr8qcFfz05P7KGqzfF_UL2vPt9-Gz9spXPp</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3053029142</pqid></control><display><type>article</type><title>Investigating the impact of transient hardware faults on deep learning neural network inference</title><source>Wiley-Blackwell Read &amp; Publish Collection</source><creator>Rahman, Md Hasanur ; Laskar, Sabuj ; Li, Guanpeng</creator><creatorcontrib>Rahman, Md Hasanur ; Laskar, Sabuj ; Li, Guanpeng</creatorcontrib><description>Summary Safety‐critical applications, such as autonomous vehicles, healthcare, and space applications, have witnessed widespread deployment of deep neural networks (DNNs). Inherent algorithmic inaccuracies have consistently been a prevalent cause of misclassifications, even in modern DNNs. Simultaneously, with an ongoing effort to minimize the footprint of contemporary chip design, there is a continual rise in the likelihood of transient hardware faults in deployed DNN models. Consequently, researchers have wondered the extent to which these faults contribute to DNN misclassifications compared to algorithmic inaccuracies. This article delves into the impact of DNN misclassifications caused by transient hardware faults and intrinsic algorithmic inaccuracies in safety‐critical applications. Initially, we enhance a cutting‐edge fault injector, TensorFI, for TensorFlow applications to facilitate fault injections on modern DNN non‐sequential models in a scalable manner. Subsequently, we analyse the DNN‐inferred outcomes based on our defined safety‐critical metrics. Finally, we conduct extensive fault injection experiments and a comprehensive analysis to achieve the following objectives: (1) investigate the impact of different target class groupings on DNN failures and (2) pinpoint the most vulnerable bit locations within tensors, as well as DNN layers accountable for the majority of safety‐critical misclassifications. Our findings regarding different grouping formations reveal that failures induced by transient hardware faults can have a substantially greater impact (with a probability up to 4 × higher) on safety‐critical applications compared to those resulting from algorithmic inaccuracies. Additionally, our investigation demonstrates that higher order bit positions in tensors, as well as initial and final layers of DNNs, necessitate prioritized protection compared to other regions. This study investigates the impact of transient hardware faults and algorithmic inaccuracies on DNN misclassifications in terms of safety‐critical behavior. Specifically, this study offers a more comprehensive understanding of the impact of multifaceted factors influencing the likelihood of safety‐critical misclassifications across different DNN models. Our thorough findings highlight that transient hardware faults pose a greater risk than intrinsic algorithmic inaccuracies to cause safety‐critical misclassifications.</description><identifier>ISSN: 0960-0833</identifier><identifier>EISSN: 1099-1689</identifier><identifier>DOI: 10.1002/stvr.1873</identifier><language>eng</language><publisher>Chichester: Wiley Subscription Services, Inc</publisher><subject>Algorithms ; Artificial neural networks ; deep neural networks ; Faults ; Hardware ; Machine learning ; Neural networks ; Safety ; safety‐critical misclassifications ; Space applications ; Tensors ; transient hardware faults</subject><ispartof>Software testing, verification &amp; reliability, 2024-06, Vol.34 (4), p.n/a</ispartof><rights>2024 The Authors. published by John Wiley &amp; Sons Ltd.</rights><rights>2024. This article is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c2923-f15767520ff9bb0ac2d167fb3cf304d11ad715874672b20a9b9135db77fbda523</cites><orcidid>0009-0002-5540-8751</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Rahman, Md Hasanur</creatorcontrib><creatorcontrib>Laskar, Sabuj</creatorcontrib><creatorcontrib>Li, Guanpeng</creatorcontrib><title>Investigating the impact of transient hardware faults on deep learning neural network inference</title><title>Software testing, verification &amp; reliability</title><description>Summary Safety‐critical applications, such as autonomous vehicles, healthcare, and space applications, have witnessed widespread deployment of deep neural networks (DNNs). Inherent algorithmic inaccuracies have consistently been a prevalent cause of misclassifications, even in modern DNNs. Simultaneously, with an ongoing effort to minimize the footprint of contemporary chip design, there is a continual rise in the likelihood of transient hardware faults in deployed DNN models. Consequently, researchers have wondered the extent to which these faults contribute to DNN misclassifications compared to algorithmic inaccuracies. This article delves into the impact of DNN misclassifications caused by transient hardware faults and intrinsic algorithmic inaccuracies in safety‐critical applications. Initially, we enhance a cutting‐edge fault injector, TensorFI, for TensorFlow applications to facilitate fault injections on modern DNN non‐sequential models in a scalable manner. Subsequently, we analyse the DNN‐inferred outcomes based on our defined safety‐critical metrics. Finally, we conduct extensive fault injection experiments and a comprehensive analysis to achieve the following objectives: (1) investigate the impact of different target class groupings on DNN failures and (2) pinpoint the most vulnerable bit locations within tensors, as well as DNN layers accountable for the majority of safety‐critical misclassifications. Our findings regarding different grouping formations reveal that failures induced by transient hardware faults can have a substantially greater impact (with a probability up to 4 × higher) on safety‐critical applications compared to those resulting from algorithmic inaccuracies. Additionally, our investigation demonstrates that higher order bit positions in tensors, as well as initial and final layers of DNNs, necessitate prioritized protection compared to other regions. This study investigates the impact of transient hardware faults and algorithmic inaccuracies on DNN misclassifications in terms of safety‐critical behavior. Specifically, this study offers a more comprehensive understanding of the impact of multifaceted factors influencing the likelihood of safety‐critical misclassifications across different DNN models. Our thorough findings highlight that transient hardware faults pose a greater risk than intrinsic algorithmic inaccuracies to cause safety‐critical misclassifications.</description><subject>Algorithms</subject><subject>Artificial neural networks</subject><subject>deep neural networks</subject><subject>Faults</subject><subject>Hardware</subject><subject>Machine learning</subject><subject>Neural networks</subject><subject>Safety</subject><subject>safety‐critical misclassifications</subject><subject>Space applications</subject><subject>Tensors</subject><subject>transient hardware faults</subject><issn>0960-0833</issn><issn>1099-1689</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>24P</sourceid><recordid>eNp1kD1PwzAURS0EEqUw8A8sMTGkfbYbOx5RxUelSkhQWC0nsduU1Am206r_noSyMt3l3Pv0DkK3BCYEgE5D3PsJyQQ7QyMCUiaEZ_IcjUBySCBj7BJdhbAFAC65HCG1cHsTYrXWsXJrHDcGV7tWFxE3FkevXaiMi3ijfXnQ3mCruzoG3DhcGtPi2mjvhqIzndd1H_HQ-C9cOWu8cYW5RhdW18Hc_OUYfTw9ruYvyfL1eTF_WCYFlZQllqSCi5SCtTLPQRe0JFzYnBWWwawkRJeCpJmYcUFzClrmkrC0zEXPlDqlbIzuTrutb767_iO1bTrv-pOKQcqASjIbqPsTVfgmBG-san210_6oCKjBnxr8qcFfz05P7KGqzfF_UL2vPt9-Gz9spXPp</recordid><startdate>202406</startdate><enddate>202406</enddate><creator>Rahman, Md Hasanur</creator><creator>Laskar, Sabuj</creator><creator>Li, Guanpeng</creator><general>Wiley Subscription Services, Inc</general><scope>24P</scope><scope>WIN</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0009-0002-5540-8751</orcidid></search><sort><creationdate>202406</creationdate><title>Investigating the impact of transient hardware faults on deep learning neural network inference</title><author>Rahman, Md Hasanur ; Laskar, Sabuj ; Li, Guanpeng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c2923-f15767520ff9bb0ac2d167fb3cf304d11ad715874672b20a9b9135db77fbda523</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Artificial neural networks</topic><topic>deep neural networks</topic><topic>Faults</topic><topic>Hardware</topic><topic>Machine learning</topic><topic>Neural networks</topic><topic>Safety</topic><topic>safety‐critical misclassifications</topic><topic>Space applications</topic><topic>Tensors</topic><topic>transient hardware faults</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Rahman, Md Hasanur</creatorcontrib><creatorcontrib>Laskar, Sabuj</creatorcontrib><creatorcontrib>Li, Guanpeng</creatorcontrib><collection>Open Access: Wiley-Blackwell Open Access Journals</collection><collection>Wiley Online Library Open Access</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Software testing, verification &amp; reliability</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Rahman, Md Hasanur</au><au>Laskar, Sabuj</au><au>Li, Guanpeng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Investigating the impact of transient hardware faults on deep learning neural network inference</atitle><jtitle>Software testing, verification &amp; reliability</jtitle><date>2024-06</date><risdate>2024</risdate><volume>34</volume><issue>4</issue><epage>n/a</epage><issn>0960-0833</issn><eissn>1099-1689</eissn><abstract>Summary Safety‐critical applications, such as autonomous vehicles, healthcare, and space applications, have witnessed widespread deployment of deep neural networks (DNNs). Inherent algorithmic inaccuracies have consistently been a prevalent cause of misclassifications, even in modern DNNs. Simultaneously, with an ongoing effort to minimize the footprint of contemporary chip design, there is a continual rise in the likelihood of transient hardware faults in deployed DNN models. Consequently, researchers have wondered the extent to which these faults contribute to DNN misclassifications compared to algorithmic inaccuracies. This article delves into the impact of DNN misclassifications caused by transient hardware faults and intrinsic algorithmic inaccuracies in safety‐critical applications. Initially, we enhance a cutting‐edge fault injector, TensorFI, for TensorFlow applications to facilitate fault injections on modern DNN non‐sequential models in a scalable manner. Subsequently, we analyse the DNN‐inferred outcomes based on our defined safety‐critical metrics. Finally, we conduct extensive fault injection experiments and a comprehensive analysis to achieve the following objectives: (1) investigate the impact of different target class groupings on DNN failures and (2) pinpoint the most vulnerable bit locations within tensors, as well as DNN layers accountable for the majority of safety‐critical misclassifications. Our findings regarding different grouping formations reveal that failures induced by transient hardware faults can have a substantially greater impact (with a probability up to 4 × higher) on safety‐critical applications compared to those resulting from algorithmic inaccuracies. Additionally, our investigation demonstrates that higher order bit positions in tensors, as well as initial and final layers of DNNs, necessitate prioritized protection compared to other regions. This study investigates the impact of transient hardware faults and algorithmic inaccuracies on DNN misclassifications in terms of safety‐critical behavior. Specifically, this study offers a more comprehensive understanding of the impact of multifaceted factors influencing the likelihood of safety‐critical misclassifications across different DNN models. Our thorough findings highlight that transient hardware faults pose a greater risk than intrinsic algorithmic inaccuracies to cause safety‐critical misclassifications.</abstract><cop>Chichester</cop><pub>Wiley Subscription Services, Inc</pub><doi>10.1002/stvr.1873</doi><tpages>24</tpages><orcidid>https://orcid.org/0009-0002-5540-8751</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0960-0833
ispartof Software testing, verification & reliability, 2024-06, Vol.34 (4), p.n/a
issn 0960-0833
1099-1689
language eng
recordid cdi_proquest_journals_3053029142
source Wiley-Blackwell Read & Publish Collection
subjects Algorithms
Artificial neural networks
deep neural networks
Faults
Hardware
Machine learning
Neural networks
Safety
safety‐critical misclassifications
Space applications
Tensors
transient hardware faults
title Investigating the impact of transient hardware faults on deep learning neural network inference
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T21%3A22%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Investigating%20the%20impact%20of%20transient%20hardware%20faults%20on%20deep%20learning%20neural%20network%20inference&rft.jtitle=Software%20testing,%20verification%20&%20reliability&rft.au=Rahman,%20Md%20Hasanur&rft.date=2024-06&rft.volume=34&rft.issue=4&rft.epage=n/a&rft.issn=0960-0833&rft.eissn=1099-1689&rft_id=info:doi/10.1002/stvr.1873&rft_dat=%3Cproquest_cross%3E3053029142%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c2923-f15767520ff9bb0ac2d167fb3cf304d11ad715874672b20a9b9135db77fbda523%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3053029142&rft_id=info:pmid/&rfr_iscdi=true