Loading…
Investigating the impact of transient hardware faults on deep learning neural network inference
Summary Safety‐critical applications, such as autonomous vehicles, healthcare, and space applications, have witnessed widespread deployment of deep neural networks (DNNs). Inherent algorithmic inaccuracies have consistently been a prevalent cause of misclassifications, even in modern DNNs. Simultane...
Saved in:
Published in: | Software testing, verification & reliability verification & reliability, 2024-06, Vol.34 (4), p.n/a |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | cdi_FETCH-LOGICAL-c2923-f15767520ff9bb0ac2d167fb3cf304d11ad715874672b20a9b9135db77fbda523 |
container_end_page | n/a |
container_issue | 4 |
container_start_page | |
container_title | Software testing, verification & reliability |
container_volume | 34 |
creator | Rahman, Md Hasanur Laskar, Sabuj Li, Guanpeng |
description | Summary
Safety‐critical applications, such as autonomous vehicles, healthcare, and space applications, have witnessed widespread deployment of deep neural networks (DNNs). Inherent algorithmic inaccuracies have consistently been a prevalent cause of misclassifications, even in modern DNNs. Simultaneously, with an ongoing effort to minimize the footprint of contemporary chip design, there is a continual rise in the likelihood of transient hardware faults in deployed DNN models. Consequently, researchers have wondered the extent to which these faults contribute to DNN misclassifications compared to algorithmic inaccuracies. This article delves into the impact of DNN misclassifications caused by transient hardware faults and intrinsic algorithmic inaccuracies in safety‐critical applications. Initially, we enhance a cutting‐edge fault injector, TensorFI, for TensorFlow applications to facilitate fault injections on modern DNN non‐sequential models in a scalable manner. Subsequently, we analyse the DNN‐inferred outcomes based on our defined safety‐critical metrics. Finally, we conduct extensive fault injection experiments and a comprehensive analysis to achieve the following objectives: (1) investigate the impact of different target class groupings on DNN failures and (2) pinpoint the most vulnerable bit locations within tensors, as well as DNN layers accountable for the majority of safety‐critical misclassifications. Our findings regarding different grouping formations reveal that failures induced by transient hardware faults can have a substantially greater impact (with a probability up to 4
× higher) on safety‐critical applications compared to those resulting from algorithmic inaccuracies. Additionally, our investigation demonstrates that higher order bit positions in tensors, as well as initial and final layers of DNNs, necessitate prioritized protection compared to other regions.
This study investigates the impact of transient hardware faults and algorithmic inaccuracies on DNN misclassifications in terms of safety‐critical behavior. Specifically, this study offers a more comprehensive understanding of the impact of multifaceted factors influencing the likelihood of safety‐critical misclassifications across different DNN models. Our thorough findings highlight that transient hardware faults pose a greater risk than intrinsic algorithmic inaccuracies to cause safety‐critical misclassifications. |
doi_str_mv | 10.1002/stvr.1873 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3053029142</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3053029142</sourcerecordid><originalsourceid>FETCH-LOGICAL-c2923-f15767520ff9bb0ac2d167fb3cf304d11ad715874672b20a9b9135db77fbda523</originalsourceid><addsrcrecordid>eNp1kD1PwzAURS0EEqUw8A8sMTGkfbYbOx5RxUelSkhQWC0nsduU1Am206r_noSyMt3l3Pv0DkK3BCYEgE5D3PsJyQQ7QyMCUiaEZ_IcjUBySCBj7BJdhbAFAC65HCG1cHsTYrXWsXJrHDcGV7tWFxE3FkevXaiMi3ijfXnQ3mCruzoG3DhcGtPi2mjvhqIzndd1H_HQ-C9cOWu8cYW5RhdW18Hc_OUYfTw9ruYvyfL1eTF_WCYFlZQllqSCi5SCtTLPQRe0JFzYnBWWwawkRJeCpJmYcUFzClrmkrC0zEXPlDqlbIzuTrutb767_iO1bTrv-pOKQcqASjIbqPsTVfgmBG-san210_6oCKjBnxr8qcFfz05P7KGqzfF_UL2vPt9-Gz9spXPp</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3053029142</pqid></control><display><type>article</type><title>Investigating the impact of transient hardware faults on deep learning neural network inference</title><source>Wiley-Blackwell Read & Publish Collection</source><creator>Rahman, Md Hasanur ; Laskar, Sabuj ; Li, Guanpeng</creator><creatorcontrib>Rahman, Md Hasanur ; Laskar, Sabuj ; Li, Guanpeng</creatorcontrib><description>Summary
Safety‐critical applications, such as autonomous vehicles, healthcare, and space applications, have witnessed widespread deployment of deep neural networks (DNNs). Inherent algorithmic inaccuracies have consistently been a prevalent cause of misclassifications, even in modern DNNs. Simultaneously, with an ongoing effort to minimize the footprint of contemporary chip design, there is a continual rise in the likelihood of transient hardware faults in deployed DNN models. Consequently, researchers have wondered the extent to which these faults contribute to DNN misclassifications compared to algorithmic inaccuracies. This article delves into the impact of DNN misclassifications caused by transient hardware faults and intrinsic algorithmic inaccuracies in safety‐critical applications. Initially, we enhance a cutting‐edge fault injector, TensorFI, for TensorFlow applications to facilitate fault injections on modern DNN non‐sequential models in a scalable manner. Subsequently, we analyse the DNN‐inferred outcomes based on our defined safety‐critical metrics. Finally, we conduct extensive fault injection experiments and a comprehensive analysis to achieve the following objectives: (1) investigate the impact of different target class groupings on DNN failures and (2) pinpoint the most vulnerable bit locations within tensors, as well as DNN layers accountable for the majority of safety‐critical misclassifications. Our findings regarding different grouping formations reveal that failures induced by transient hardware faults can have a substantially greater impact (with a probability up to 4
× higher) on safety‐critical applications compared to those resulting from algorithmic inaccuracies. Additionally, our investigation demonstrates that higher order bit positions in tensors, as well as initial and final layers of DNNs, necessitate prioritized protection compared to other regions.
This study investigates the impact of transient hardware faults and algorithmic inaccuracies on DNN misclassifications in terms of safety‐critical behavior. Specifically, this study offers a more comprehensive understanding of the impact of multifaceted factors influencing the likelihood of safety‐critical misclassifications across different DNN models. Our thorough findings highlight that transient hardware faults pose a greater risk than intrinsic algorithmic inaccuracies to cause safety‐critical misclassifications.</description><identifier>ISSN: 0960-0833</identifier><identifier>EISSN: 1099-1689</identifier><identifier>DOI: 10.1002/stvr.1873</identifier><language>eng</language><publisher>Chichester: Wiley Subscription Services, Inc</publisher><subject>Algorithms ; Artificial neural networks ; deep neural networks ; Faults ; Hardware ; Machine learning ; Neural networks ; Safety ; safety‐critical misclassifications ; Space applications ; Tensors ; transient hardware faults</subject><ispartof>Software testing, verification & reliability, 2024-06, Vol.34 (4), p.n/a</ispartof><rights>2024 The Authors. published by John Wiley & Sons Ltd.</rights><rights>2024. This article is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c2923-f15767520ff9bb0ac2d167fb3cf304d11ad715874672b20a9b9135db77fbda523</cites><orcidid>0009-0002-5540-8751</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Rahman, Md Hasanur</creatorcontrib><creatorcontrib>Laskar, Sabuj</creatorcontrib><creatorcontrib>Li, Guanpeng</creatorcontrib><title>Investigating the impact of transient hardware faults on deep learning neural network inference</title><title>Software testing, verification & reliability</title><description>Summary
Safety‐critical applications, such as autonomous vehicles, healthcare, and space applications, have witnessed widespread deployment of deep neural networks (DNNs). Inherent algorithmic inaccuracies have consistently been a prevalent cause of misclassifications, even in modern DNNs. Simultaneously, with an ongoing effort to minimize the footprint of contemporary chip design, there is a continual rise in the likelihood of transient hardware faults in deployed DNN models. Consequently, researchers have wondered the extent to which these faults contribute to DNN misclassifications compared to algorithmic inaccuracies. This article delves into the impact of DNN misclassifications caused by transient hardware faults and intrinsic algorithmic inaccuracies in safety‐critical applications. Initially, we enhance a cutting‐edge fault injector, TensorFI, for TensorFlow applications to facilitate fault injections on modern DNN non‐sequential models in a scalable manner. Subsequently, we analyse the DNN‐inferred outcomes based on our defined safety‐critical metrics. Finally, we conduct extensive fault injection experiments and a comprehensive analysis to achieve the following objectives: (1) investigate the impact of different target class groupings on DNN failures and (2) pinpoint the most vulnerable bit locations within tensors, as well as DNN layers accountable for the majority of safety‐critical misclassifications. Our findings regarding different grouping formations reveal that failures induced by transient hardware faults can have a substantially greater impact (with a probability up to 4
× higher) on safety‐critical applications compared to those resulting from algorithmic inaccuracies. Additionally, our investigation demonstrates that higher order bit positions in tensors, as well as initial and final layers of DNNs, necessitate prioritized protection compared to other regions.
This study investigates the impact of transient hardware faults and algorithmic inaccuracies on DNN misclassifications in terms of safety‐critical behavior. Specifically, this study offers a more comprehensive understanding of the impact of multifaceted factors influencing the likelihood of safety‐critical misclassifications across different DNN models. Our thorough findings highlight that transient hardware faults pose a greater risk than intrinsic algorithmic inaccuracies to cause safety‐critical misclassifications.</description><subject>Algorithms</subject><subject>Artificial neural networks</subject><subject>deep neural networks</subject><subject>Faults</subject><subject>Hardware</subject><subject>Machine learning</subject><subject>Neural networks</subject><subject>Safety</subject><subject>safety‐critical misclassifications</subject><subject>Space applications</subject><subject>Tensors</subject><subject>transient hardware faults</subject><issn>0960-0833</issn><issn>1099-1689</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>24P</sourceid><recordid>eNp1kD1PwzAURS0EEqUw8A8sMTGkfbYbOx5RxUelSkhQWC0nsduU1Am206r_noSyMt3l3Pv0DkK3BCYEgE5D3PsJyQQ7QyMCUiaEZ_IcjUBySCBj7BJdhbAFAC65HCG1cHsTYrXWsXJrHDcGV7tWFxE3FkevXaiMi3ijfXnQ3mCruzoG3DhcGtPi2mjvhqIzndd1H_HQ-C9cOWu8cYW5RhdW18Hc_OUYfTw9ruYvyfL1eTF_WCYFlZQllqSCi5SCtTLPQRe0JFzYnBWWwawkRJeCpJmYcUFzClrmkrC0zEXPlDqlbIzuTrutb767_iO1bTrv-pOKQcqASjIbqPsTVfgmBG-san210_6oCKjBnxr8qcFfz05P7KGqzfF_UL2vPt9-Gz9spXPp</recordid><startdate>202406</startdate><enddate>202406</enddate><creator>Rahman, Md Hasanur</creator><creator>Laskar, Sabuj</creator><creator>Li, Guanpeng</creator><general>Wiley Subscription Services, Inc</general><scope>24P</scope><scope>WIN</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0009-0002-5540-8751</orcidid></search><sort><creationdate>202406</creationdate><title>Investigating the impact of transient hardware faults on deep learning neural network inference</title><author>Rahman, Md Hasanur ; Laskar, Sabuj ; Li, Guanpeng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c2923-f15767520ff9bb0ac2d167fb3cf304d11ad715874672b20a9b9135db77fbda523</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Artificial neural networks</topic><topic>deep neural networks</topic><topic>Faults</topic><topic>Hardware</topic><topic>Machine learning</topic><topic>Neural networks</topic><topic>Safety</topic><topic>safety‐critical misclassifications</topic><topic>Space applications</topic><topic>Tensors</topic><topic>transient hardware faults</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Rahman, Md Hasanur</creatorcontrib><creatorcontrib>Laskar, Sabuj</creatorcontrib><creatorcontrib>Li, Guanpeng</creatorcontrib><collection>Open Access: Wiley-Blackwell Open Access Journals</collection><collection>Wiley Online Library Open Access</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Software testing, verification & reliability</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Rahman, Md Hasanur</au><au>Laskar, Sabuj</au><au>Li, Guanpeng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Investigating the impact of transient hardware faults on deep learning neural network inference</atitle><jtitle>Software testing, verification & reliability</jtitle><date>2024-06</date><risdate>2024</risdate><volume>34</volume><issue>4</issue><epage>n/a</epage><issn>0960-0833</issn><eissn>1099-1689</eissn><abstract>Summary
Safety‐critical applications, such as autonomous vehicles, healthcare, and space applications, have witnessed widespread deployment of deep neural networks (DNNs). Inherent algorithmic inaccuracies have consistently been a prevalent cause of misclassifications, even in modern DNNs. Simultaneously, with an ongoing effort to minimize the footprint of contemporary chip design, there is a continual rise in the likelihood of transient hardware faults in deployed DNN models. Consequently, researchers have wondered the extent to which these faults contribute to DNN misclassifications compared to algorithmic inaccuracies. This article delves into the impact of DNN misclassifications caused by transient hardware faults and intrinsic algorithmic inaccuracies in safety‐critical applications. Initially, we enhance a cutting‐edge fault injector, TensorFI, for TensorFlow applications to facilitate fault injections on modern DNN non‐sequential models in a scalable manner. Subsequently, we analyse the DNN‐inferred outcomes based on our defined safety‐critical metrics. Finally, we conduct extensive fault injection experiments and a comprehensive analysis to achieve the following objectives: (1) investigate the impact of different target class groupings on DNN failures and (2) pinpoint the most vulnerable bit locations within tensors, as well as DNN layers accountable for the majority of safety‐critical misclassifications. Our findings regarding different grouping formations reveal that failures induced by transient hardware faults can have a substantially greater impact (with a probability up to 4
× higher) on safety‐critical applications compared to those resulting from algorithmic inaccuracies. Additionally, our investigation demonstrates that higher order bit positions in tensors, as well as initial and final layers of DNNs, necessitate prioritized protection compared to other regions.
This study investigates the impact of transient hardware faults and algorithmic inaccuracies on DNN misclassifications in terms of safety‐critical behavior. Specifically, this study offers a more comprehensive understanding of the impact of multifaceted factors influencing the likelihood of safety‐critical misclassifications across different DNN models. Our thorough findings highlight that transient hardware faults pose a greater risk than intrinsic algorithmic inaccuracies to cause safety‐critical misclassifications.</abstract><cop>Chichester</cop><pub>Wiley Subscription Services, Inc</pub><doi>10.1002/stvr.1873</doi><tpages>24</tpages><orcidid>https://orcid.org/0009-0002-5540-8751</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0960-0833 |
ispartof | Software testing, verification & reliability, 2024-06, Vol.34 (4), p.n/a |
issn | 0960-0833 1099-1689 |
language | eng |
recordid | cdi_proquest_journals_3053029142 |
source | Wiley-Blackwell Read & Publish Collection |
subjects | Algorithms Artificial neural networks deep neural networks Faults Hardware Machine learning Neural networks Safety safety‐critical misclassifications Space applications Tensors transient hardware faults |
title | Investigating the impact of transient hardware faults on deep learning neural network inference |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T21%3A22%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Investigating%20the%20impact%20of%20transient%20hardware%20faults%20on%20deep%20learning%20neural%20network%20inference&rft.jtitle=Software%20testing,%20verification%20&%20reliability&rft.au=Rahman,%20Md%20Hasanur&rft.date=2024-06&rft.volume=34&rft.issue=4&rft.epage=n/a&rft.issn=0960-0833&rft.eissn=1099-1689&rft_id=info:doi/10.1002/stvr.1873&rft_dat=%3Cproquest_cross%3E3053029142%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c2923-f15767520ff9bb0ac2d167fb3cf304d11ad715874672b20a9b9135db77fbda523%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3053029142&rft_id=info:pmid/&rfr_iscdi=true |