Loading…

Layer-Sensitive Neural Processing Architecture for Error-Tolerant Applications

Neural network (NN) operation has high requirements for storage resources and parallel computing, which bring huge challenges to the deployment of NNs in Internet-of-Things (IoT) devices. Consequently, this work proposed a low-power NN architecture, comprising an energy-efficient NN processor and a...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on very large scale integration (VLSI) systems 2024-05, Vol.32 (5), p.797-809
Main Authors: Li, Zeju, Wang, Qinfan, Zou, Zihan, Shen, Qiao, Xie, Na, Cai, Hao, Zhang, Hao, Liu, Bo
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c247t-a986758ce2f9f84baa4cc5a1bc96c879eda6f14c8d82e5728f8ffcbbe6a2d3a53
container_end_page 809
container_issue 5
container_start_page 797
container_title IEEE transactions on very large scale integration (VLSI) systems
container_volume 32
creator Li, Zeju
Wang, Qinfan
Zou, Zihan
Shen, Qiao
Xie, Na
Cai, Hao
Zhang, Hao
Liu, Bo
description Neural network (NN) operation has high requirements for storage resources and parallel computing, which bring huge challenges to the deployment of NNs in Internet-of-Things (IoT) devices. Consequently, this work proposed a low-power NN architecture, comprising an energy-efficient NN processor and a Cortex-M3 host processor to achieve state-of-the-art (SOTA) end-to-end inference at the edge. The innovations of this article are as follows: 1) to minimize the bit width of the weight while keeping the loss of accuracy within a small range, cross-layer error tolerance has been analyzed, and mixed precision quantization has been adopted for cross-layer mapping; 2) dynamic reconfigurable tensor processing unit (DR-TPU) with approximate computing has been proposed, which brings 1.45\times computing energy reduction within 0.46% accurate loss in ResNet-50; and 3) a customized input feature map (IFM) reuse and over-writeback strategy has been adopted, eliminating the recurrent fetching from the on-chip and off-chip memories. The times of on-chip storage access can be reduced by 25%-60%, and the capacity of on-chip memory can be reduced to half of the original. The processor has been implemented at 28-nm CMOS technology. Combining the above work, the proposed architecture can achieve a 53.1% reduction of power and 17.2-TOPS/W energy efficiency.
doi_str_mv 10.1109/TVLSI.2024.3369648
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TVLSI_2024_3369648</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10458963</ieee_id><sourcerecordid>3046908982</sourcerecordid><originalsourceid>FETCH-LOGICAL-c247t-a986758ce2f9f84baa4cc5a1bc96c879eda6f14c8d82e5728f8ffcbbe6a2d3a53</originalsourceid><addsrcrecordid>eNpNkE1LAzEQhoMoWKt_QDwseN6ar80mx1KqFkoVWr2GbDrRlHWzJrtC_71b24NzeefwPjPwIHRL8IQQrB4278v1YkIx5RPGhBJcnqERKYoyV8OcDzsWLJeU4Et0ldIOY8K5wiO0Wpo9xHwNTfKd_4FsBX00dfYag4WUfPORTaP99B3Yro-QuRCzeYwh5ptQQzRNl03btvbWdD406RpdOFMnuDnlGL09zjez53z58rSYTZe5pbzscqOkKAtpgTrlJK-M4dYWhlRWCStLBVsjHOFWbiWFoqTSSedsVYEwdMtMwcbo_ni3jeG7h9TpXehjM7zUDHOhsFSSDi16bNkYUorgdBv9l4l7TbA-eNN_3vTBmz55G6C7I-QB4B_AC6kEY78XomvU</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3046908982</pqid></control><display><type>article</type><title>Layer-Sensitive Neural Processing Architecture for Error-Tolerant Applications</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Li, Zeju ; Wang, Qinfan ; Zou, Zihan ; Shen, Qiao ; Xie, Na ; Cai, Hao ; Zhang, Hao ; Liu, Bo</creator><creatorcontrib>Li, Zeju ; Wang, Qinfan ; Zou, Zihan ; Shen, Qiao ; Xie, Na ; Cai, Hao ; Zhang, Hao ; Liu, Bo</creatorcontrib><description>Neural network (NN) operation has high requirements for storage resources and parallel computing, which bring huge challenges to the deployment of NNs in Internet-of-Things (IoT) devices. Consequently, this work proposed a low-power NN architecture, comprising an energy-efficient NN processor and a Cortex-M3 host processor to achieve state-of-the-art (SOTA) end-to-end inference at the edge. The innovations of this article are as follows: 1) to minimize the bit width of the weight while keeping the loss of accuracy within a small range, cross-layer error tolerance has been analyzed, and mixed precision quantization has been adopted for cross-layer mapping; 2) dynamic reconfigurable tensor processing unit (DR-TPU) with approximate computing has been proposed, which brings &lt;inline-formula&gt; &lt;tex-math notation="LaTeX"&gt;1.45\times &lt;/tex-math&gt;&lt;/inline-formula&gt; computing energy reduction within 0.46% accurate loss in ResNet-50; and 3) a customized input feature map (IFM) reuse and over-writeback strategy has been adopted, eliminating the recurrent fetching from the on-chip and off-chip memories. The times of on-chip storage access can be reduced by 25%-60%, and the capacity of on-chip memory can be reduced to half of the original. The processor has been implemented at 28-nm CMOS technology. Combining the above work, the proposed architecture can achieve a 53.1% reduction of power and 17.2-TOPS/W energy efficiency.</description><identifier>ISSN: 1063-8210</identifier><identifier>EISSN: 1557-9999</identifier><identifier>DOI: 10.1109/TVLSI.2024.3369648</identifier><identifier>CODEN: IEVSE9</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Approximate computing ; Artificial neural networks ; Chips (memory devices) ; Computer architecture ; Energy efficiency ; energy-efficient design ; Error analysis ; Feature maps ; Hardware ; Internet of Things ; Internet of Things (IoT) ; layerwise quantization ; Microprocessors ; neural network (NN) processor ; Neural networks ; Power demand ; Power management ; Quantization (signal) ; System-on-chip ; Tensors</subject><ispartof>IEEE transactions on very large scale integration (VLSI) systems, 2024-05, Vol.32 (5), p.797-809</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c247t-a986758ce2f9f84baa4cc5a1bc96c879eda6f14c8d82e5728f8ffcbbe6a2d3a53</cites><orcidid>0009-0009-1563-4499 ; 0009-0002-1510-1341 ; 0009-0001-5846-1294 ; 0000-0002-0894-1054 ; 0000-0001-9794-8049</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10458963$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,54796</link.rule.ids></links><search><creatorcontrib>Li, Zeju</creatorcontrib><creatorcontrib>Wang, Qinfan</creatorcontrib><creatorcontrib>Zou, Zihan</creatorcontrib><creatorcontrib>Shen, Qiao</creatorcontrib><creatorcontrib>Xie, Na</creatorcontrib><creatorcontrib>Cai, Hao</creatorcontrib><creatorcontrib>Zhang, Hao</creatorcontrib><creatorcontrib>Liu, Bo</creatorcontrib><title>Layer-Sensitive Neural Processing Architecture for Error-Tolerant Applications</title><title>IEEE transactions on very large scale integration (VLSI) systems</title><addtitle>TVLSI</addtitle><description>Neural network (NN) operation has high requirements for storage resources and parallel computing, which bring huge challenges to the deployment of NNs in Internet-of-Things (IoT) devices. Consequently, this work proposed a low-power NN architecture, comprising an energy-efficient NN processor and a Cortex-M3 host processor to achieve state-of-the-art (SOTA) end-to-end inference at the edge. The innovations of this article are as follows: 1) to minimize the bit width of the weight while keeping the loss of accuracy within a small range, cross-layer error tolerance has been analyzed, and mixed precision quantization has been adopted for cross-layer mapping; 2) dynamic reconfigurable tensor processing unit (DR-TPU) with approximate computing has been proposed, which brings &lt;inline-formula&gt; &lt;tex-math notation="LaTeX"&gt;1.45\times &lt;/tex-math&gt;&lt;/inline-formula&gt; computing energy reduction within 0.46% accurate loss in ResNet-50; and 3) a customized input feature map (IFM) reuse and over-writeback strategy has been adopted, eliminating the recurrent fetching from the on-chip and off-chip memories. The times of on-chip storage access can be reduced by 25%-60%, and the capacity of on-chip memory can be reduced to half of the original. The processor has been implemented at 28-nm CMOS technology. Combining the above work, the proposed architecture can achieve a 53.1% reduction of power and 17.2-TOPS/W energy efficiency.</description><subject>Approximate computing</subject><subject>Artificial neural networks</subject><subject>Chips (memory devices)</subject><subject>Computer architecture</subject><subject>Energy efficiency</subject><subject>energy-efficient design</subject><subject>Error analysis</subject><subject>Feature maps</subject><subject>Hardware</subject><subject>Internet of Things</subject><subject>Internet of Things (IoT)</subject><subject>layerwise quantization</subject><subject>Microprocessors</subject><subject>neural network (NN) processor</subject><subject>Neural networks</subject><subject>Power demand</subject><subject>Power management</subject><subject>Quantization (signal)</subject><subject>System-on-chip</subject><subject>Tensors</subject><issn>1063-8210</issn><issn>1557-9999</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNpNkE1LAzEQhoMoWKt_QDwseN6ar80mx1KqFkoVWr2GbDrRlHWzJrtC_71b24NzeefwPjPwIHRL8IQQrB4278v1YkIx5RPGhBJcnqERKYoyV8OcDzsWLJeU4Et0ldIOY8K5wiO0Wpo9xHwNTfKd_4FsBX00dfYag4WUfPORTaP99B3Yro-QuRCzeYwh5ptQQzRNl03btvbWdD406RpdOFMnuDnlGL09zjez53z58rSYTZe5pbzscqOkKAtpgTrlJK-M4dYWhlRWCStLBVsjHOFWbiWFoqTSSedsVYEwdMtMwcbo_ni3jeG7h9TpXehjM7zUDHOhsFSSDi16bNkYUorgdBv9l4l7TbA-eNN_3vTBmz55G6C7I-QB4B_AC6kEY78XomvU</recordid><startdate>20240501</startdate><enddate>20240501</enddate><creator>Li, Zeju</creator><creator>Wang, Qinfan</creator><creator>Zou, Zihan</creator><creator>Shen, Qiao</creator><creator>Xie, Na</creator><creator>Cai, Hao</creator><creator>Zhang, Hao</creator><creator>Liu, Bo</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>8FD</scope><scope>L7M</scope><orcidid>https://orcid.org/0009-0009-1563-4499</orcidid><orcidid>https://orcid.org/0009-0002-1510-1341</orcidid><orcidid>https://orcid.org/0009-0001-5846-1294</orcidid><orcidid>https://orcid.org/0000-0002-0894-1054</orcidid><orcidid>https://orcid.org/0000-0001-9794-8049</orcidid></search><sort><creationdate>20240501</creationdate><title>Layer-Sensitive Neural Processing Architecture for Error-Tolerant Applications</title><author>Li, Zeju ; Wang, Qinfan ; Zou, Zihan ; Shen, Qiao ; Xie, Na ; Cai, Hao ; Zhang, Hao ; Liu, Bo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c247t-a986758ce2f9f84baa4cc5a1bc96c879eda6f14c8d82e5728f8ffcbbe6a2d3a53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Approximate computing</topic><topic>Artificial neural networks</topic><topic>Chips (memory devices)</topic><topic>Computer architecture</topic><topic>Energy efficiency</topic><topic>energy-efficient design</topic><topic>Error analysis</topic><topic>Feature maps</topic><topic>Hardware</topic><topic>Internet of Things</topic><topic>Internet of Things (IoT)</topic><topic>layerwise quantization</topic><topic>Microprocessors</topic><topic>neural network (NN) processor</topic><topic>Neural networks</topic><topic>Power demand</topic><topic>Power management</topic><topic>Quantization (signal)</topic><topic>System-on-chip</topic><topic>Tensors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Zeju</creatorcontrib><creatorcontrib>Wang, Qinfan</creatorcontrib><creatorcontrib>Zou, Zihan</creatorcontrib><creatorcontrib>Shen, Qiao</creatorcontrib><creatorcontrib>Xie, Na</creatorcontrib><creatorcontrib>Cai, Hao</creatorcontrib><creatorcontrib>Zhang, Hao</creatorcontrib><creatorcontrib>Liu, Bo</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library Online</collection><collection>CrossRef</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection><jtitle>IEEE transactions on very large scale integration (VLSI) systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Li, Zeju</au><au>Wang, Qinfan</au><au>Zou, Zihan</au><au>Shen, Qiao</au><au>Xie, Na</au><au>Cai, Hao</au><au>Zhang, Hao</au><au>Liu, Bo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Layer-Sensitive Neural Processing Architecture for Error-Tolerant Applications</atitle><jtitle>IEEE transactions on very large scale integration (VLSI) systems</jtitle><stitle>TVLSI</stitle><date>2024-05-01</date><risdate>2024</risdate><volume>32</volume><issue>5</issue><spage>797</spage><epage>809</epage><pages>797-809</pages><issn>1063-8210</issn><eissn>1557-9999</eissn><coden>IEVSE9</coden><abstract>Neural network (NN) operation has high requirements for storage resources and parallel computing, which bring huge challenges to the deployment of NNs in Internet-of-Things (IoT) devices. Consequently, this work proposed a low-power NN architecture, comprising an energy-efficient NN processor and a Cortex-M3 host processor to achieve state-of-the-art (SOTA) end-to-end inference at the edge. The innovations of this article are as follows: 1) to minimize the bit width of the weight while keeping the loss of accuracy within a small range, cross-layer error tolerance has been analyzed, and mixed precision quantization has been adopted for cross-layer mapping; 2) dynamic reconfigurable tensor processing unit (DR-TPU) with approximate computing has been proposed, which brings &lt;inline-formula&gt; &lt;tex-math notation="LaTeX"&gt;1.45\times &lt;/tex-math&gt;&lt;/inline-formula&gt; computing energy reduction within 0.46% accurate loss in ResNet-50; and 3) a customized input feature map (IFM) reuse and over-writeback strategy has been adopted, eliminating the recurrent fetching from the on-chip and off-chip memories. The times of on-chip storage access can be reduced by 25%-60%, and the capacity of on-chip memory can be reduced to half of the original. The processor has been implemented at 28-nm CMOS technology. Combining the above work, the proposed architecture can achieve a 53.1% reduction of power and 17.2-TOPS/W energy efficiency.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TVLSI.2024.3369648</doi><tpages>13</tpages><orcidid>https://orcid.org/0009-0009-1563-4499</orcidid><orcidid>https://orcid.org/0009-0002-1510-1341</orcidid><orcidid>https://orcid.org/0009-0001-5846-1294</orcidid><orcidid>https://orcid.org/0000-0002-0894-1054</orcidid><orcidid>https://orcid.org/0000-0001-9794-8049</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1063-8210
ispartof IEEE transactions on very large scale integration (VLSI) systems, 2024-05, Vol.32 (5), p.797-809
issn 1063-8210
1557-9999
language eng
recordid cdi_crossref_primary_10_1109_TVLSI_2024_3369648
source IEEE Electronic Library (IEL) Journals
subjects Approximate computing
Artificial neural networks
Chips (memory devices)
Computer architecture
Energy efficiency
energy-efficient design
Error analysis
Feature maps
Hardware
Internet of Things
Internet of Things (IoT)
layerwise quantization
Microprocessors
neural network (NN) processor
Neural networks
Power demand
Power management
Quantization (signal)
System-on-chip
Tensors
title Layer-Sensitive Neural Processing Architecture for Error-Tolerant Applications
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-23T12%3A01%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Layer-Sensitive%20Neural%20Processing%20Architecture%20for%20Error-Tolerant%20Applications&rft.jtitle=IEEE%20transactions%20on%20very%20large%20scale%20integration%20(VLSI)%20systems&rft.au=Li,%20Zeju&rft.date=2024-05-01&rft.volume=32&rft.issue=5&rft.spage=797&rft.epage=809&rft.pages=797-809&rft.issn=1063-8210&rft.eissn=1557-9999&rft.coden=IEVSE9&rft_id=info:doi/10.1109/TVLSI.2024.3369648&rft_dat=%3Cproquest_cross%3E3046908982%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c247t-a986758ce2f9f84baa4cc5a1bc96c879eda6f14c8d82e5728f8ffcbbe6a2d3a53%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3046908982&rft_id=info:pmid/&rft_ieee_id=10458963&rfr_iscdi=true