Loading…
Layer-Sensitive Neural Processing Architecture for Error-Tolerant Applications
Neural network (NN) operation has high requirements for storage resources and parallel computing, which bring huge challenges to the deployment of NNs in Internet-of-Things (IoT) devices. Consequently, this work proposed a low-power NN architecture, comprising an energy-efficient NN processor and a...
Saved in:
Published in: | IEEE transactions on very large scale integration (VLSI) systems 2024-05, Vol.32 (5), p.797-809 |
---|---|
Main Authors: | , , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | cdi_FETCH-LOGICAL-c247t-a986758ce2f9f84baa4cc5a1bc96c879eda6f14c8d82e5728f8ffcbbe6a2d3a53 |
container_end_page | 809 |
container_issue | 5 |
container_start_page | 797 |
container_title | IEEE transactions on very large scale integration (VLSI) systems |
container_volume | 32 |
creator | Li, Zeju Wang, Qinfan Zou, Zihan Shen, Qiao Xie, Na Cai, Hao Zhang, Hao Liu, Bo |
description | Neural network (NN) operation has high requirements for storage resources and parallel computing, which bring huge challenges to the deployment of NNs in Internet-of-Things (IoT) devices. Consequently, this work proposed a low-power NN architecture, comprising an energy-efficient NN processor and a Cortex-M3 host processor to achieve state-of-the-art (SOTA) end-to-end inference at the edge. The innovations of this article are as follows: 1) to minimize the bit width of the weight while keeping the loss of accuracy within a small range, cross-layer error tolerance has been analyzed, and mixed precision quantization has been adopted for cross-layer mapping; 2) dynamic reconfigurable tensor processing unit (DR-TPU) with approximate computing has been proposed, which brings 1.45\times computing energy reduction within 0.46% accurate loss in ResNet-50; and 3) a customized input feature map (IFM) reuse and over-writeback strategy has been adopted, eliminating the recurrent fetching from the on-chip and off-chip memories. The times of on-chip storage access can be reduced by 25%-60%, and the capacity of on-chip memory can be reduced to half of the original. The processor has been implemented at 28-nm CMOS technology. Combining the above work, the proposed architecture can achieve a 53.1% reduction of power and 17.2-TOPS/W energy efficiency. |
doi_str_mv | 10.1109/TVLSI.2024.3369648 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TVLSI_2024_3369648</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10458963</ieee_id><sourcerecordid>3046908982</sourcerecordid><originalsourceid>FETCH-LOGICAL-c247t-a986758ce2f9f84baa4cc5a1bc96c879eda6f14c8d82e5728f8ffcbbe6a2d3a53</originalsourceid><addsrcrecordid>eNpNkE1LAzEQhoMoWKt_QDwseN6ar80mx1KqFkoVWr2GbDrRlHWzJrtC_71b24NzeefwPjPwIHRL8IQQrB4278v1YkIx5RPGhBJcnqERKYoyV8OcDzsWLJeU4Et0ldIOY8K5wiO0Wpo9xHwNTfKd_4FsBX00dfYag4WUfPORTaP99B3Yro-QuRCzeYwh5ptQQzRNl03btvbWdD406RpdOFMnuDnlGL09zjez53z58rSYTZe5pbzscqOkKAtpgTrlJK-M4dYWhlRWCStLBVsjHOFWbiWFoqTSSedsVYEwdMtMwcbo_ni3jeG7h9TpXehjM7zUDHOhsFSSDi16bNkYUorgdBv9l4l7TbA-eNN_3vTBmz55G6C7I-QB4B_AC6kEY78XomvU</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3046908982</pqid></control><display><type>article</type><title>Layer-Sensitive Neural Processing Architecture for Error-Tolerant Applications</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Li, Zeju ; Wang, Qinfan ; Zou, Zihan ; Shen, Qiao ; Xie, Na ; Cai, Hao ; Zhang, Hao ; Liu, Bo</creator><creatorcontrib>Li, Zeju ; Wang, Qinfan ; Zou, Zihan ; Shen, Qiao ; Xie, Na ; Cai, Hao ; Zhang, Hao ; Liu, Bo</creatorcontrib><description>Neural network (NN) operation has high requirements for storage resources and parallel computing, which bring huge challenges to the deployment of NNs in Internet-of-Things (IoT) devices. Consequently, this work proposed a low-power NN architecture, comprising an energy-efficient NN processor and a Cortex-M3 host processor to achieve state-of-the-art (SOTA) end-to-end inference at the edge. The innovations of this article are as follows: 1) to minimize the bit width of the weight while keeping the loss of accuracy within a small range, cross-layer error tolerance has been analyzed, and mixed precision quantization has been adopted for cross-layer mapping; 2) dynamic reconfigurable tensor processing unit (DR-TPU) with approximate computing has been proposed, which brings <inline-formula> <tex-math notation="LaTeX">1.45\times </tex-math></inline-formula> computing energy reduction within 0.46% accurate loss in ResNet-50; and 3) a customized input feature map (IFM) reuse and over-writeback strategy has been adopted, eliminating the recurrent fetching from the on-chip and off-chip memories. The times of on-chip storage access can be reduced by 25%-60%, and the capacity of on-chip memory can be reduced to half of the original. The processor has been implemented at 28-nm CMOS technology. Combining the above work, the proposed architecture can achieve a 53.1% reduction of power and 17.2-TOPS/W energy efficiency.</description><identifier>ISSN: 1063-8210</identifier><identifier>EISSN: 1557-9999</identifier><identifier>DOI: 10.1109/TVLSI.2024.3369648</identifier><identifier>CODEN: IEVSE9</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Approximate computing ; Artificial neural networks ; Chips (memory devices) ; Computer architecture ; Energy efficiency ; energy-efficient design ; Error analysis ; Feature maps ; Hardware ; Internet of Things ; Internet of Things (IoT) ; layerwise quantization ; Microprocessors ; neural network (NN) processor ; Neural networks ; Power demand ; Power management ; Quantization (signal) ; System-on-chip ; Tensors</subject><ispartof>IEEE transactions on very large scale integration (VLSI) systems, 2024-05, Vol.32 (5), p.797-809</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c247t-a986758ce2f9f84baa4cc5a1bc96c879eda6f14c8d82e5728f8ffcbbe6a2d3a53</cites><orcidid>0009-0009-1563-4499 ; 0009-0002-1510-1341 ; 0009-0001-5846-1294 ; 0000-0002-0894-1054 ; 0000-0001-9794-8049</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10458963$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,54796</link.rule.ids></links><search><creatorcontrib>Li, Zeju</creatorcontrib><creatorcontrib>Wang, Qinfan</creatorcontrib><creatorcontrib>Zou, Zihan</creatorcontrib><creatorcontrib>Shen, Qiao</creatorcontrib><creatorcontrib>Xie, Na</creatorcontrib><creatorcontrib>Cai, Hao</creatorcontrib><creatorcontrib>Zhang, Hao</creatorcontrib><creatorcontrib>Liu, Bo</creatorcontrib><title>Layer-Sensitive Neural Processing Architecture for Error-Tolerant Applications</title><title>IEEE transactions on very large scale integration (VLSI) systems</title><addtitle>TVLSI</addtitle><description>Neural network (NN) operation has high requirements for storage resources and parallel computing, which bring huge challenges to the deployment of NNs in Internet-of-Things (IoT) devices. Consequently, this work proposed a low-power NN architecture, comprising an energy-efficient NN processor and a Cortex-M3 host processor to achieve state-of-the-art (SOTA) end-to-end inference at the edge. The innovations of this article are as follows: 1) to minimize the bit width of the weight while keeping the loss of accuracy within a small range, cross-layer error tolerance has been analyzed, and mixed precision quantization has been adopted for cross-layer mapping; 2) dynamic reconfigurable tensor processing unit (DR-TPU) with approximate computing has been proposed, which brings <inline-formula> <tex-math notation="LaTeX">1.45\times </tex-math></inline-formula> computing energy reduction within 0.46% accurate loss in ResNet-50; and 3) a customized input feature map (IFM) reuse and over-writeback strategy has been adopted, eliminating the recurrent fetching from the on-chip and off-chip memories. The times of on-chip storage access can be reduced by 25%-60%, and the capacity of on-chip memory can be reduced to half of the original. The processor has been implemented at 28-nm CMOS technology. Combining the above work, the proposed architecture can achieve a 53.1% reduction of power and 17.2-TOPS/W energy efficiency.</description><subject>Approximate computing</subject><subject>Artificial neural networks</subject><subject>Chips (memory devices)</subject><subject>Computer architecture</subject><subject>Energy efficiency</subject><subject>energy-efficient design</subject><subject>Error analysis</subject><subject>Feature maps</subject><subject>Hardware</subject><subject>Internet of Things</subject><subject>Internet of Things (IoT)</subject><subject>layerwise quantization</subject><subject>Microprocessors</subject><subject>neural network (NN) processor</subject><subject>Neural networks</subject><subject>Power demand</subject><subject>Power management</subject><subject>Quantization (signal)</subject><subject>System-on-chip</subject><subject>Tensors</subject><issn>1063-8210</issn><issn>1557-9999</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNpNkE1LAzEQhoMoWKt_QDwseN6ar80mx1KqFkoVWr2GbDrRlHWzJrtC_71b24NzeefwPjPwIHRL8IQQrB4278v1YkIx5RPGhBJcnqERKYoyV8OcDzsWLJeU4Et0ldIOY8K5wiO0Wpo9xHwNTfKd_4FsBX00dfYag4WUfPORTaP99B3Yro-QuRCzeYwh5ptQQzRNl03btvbWdD406RpdOFMnuDnlGL09zjez53z58rSYTZe5pbzscqOkKAtpgTrlJK-M4dYWhlRWCStLBVsjHOFWbiWFoqTSSedsVYEwdMtMwcbo_ni3jeG7h9TpXehjM7zUDHOhsFSSDi16bNkYUorgdBv9l4l7TbA-eNN_3vTBmz55G6C7I-QB4B_AC6kEY78XomvU</recordid><startdate>20240501</startdate><enddate>20240501</enddate><creator>Li, Zeju</creator><creator>Wang, Qinfan</creator><creator>Zou, Zihan</creator><creator>Shen, Qiao</creator><creator>Xie, Na</creator><creator>Cai, Hao</creator><creator>Zhang, Hao</creator><creator>Liu, Bo</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>8FD</scope><scope>L7M</scope><orcidid>https://orcid.org/0009-0009-1563-4499</orcidid><orcidid>https://orcid.org/0009-0002-1510-1341</orcidid><orcidid>https://orcid.org/0009-0001-5846-1294</orcidid><orcidid>https://orcid.org/0000-0002-0894-1054</orcidid><orcidid>https://orcid.org/0000-0001-9794-8049</orcidid></search><sort><creationdate>20240501</creationdate><title>Layer-Sensitive Neural Processing Architecture for Error-Tolerant Applications</title><author>Li, Zeju ; Wang, Qinfan ; Zou, Zihan ; Shen, Qiao ; Xie, Na ; Cai, Hao ; Zhang, Hao ; Liu, Bo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c247t-a986758ce2f9f84baa4cc5a1bc96c879eda6f14c8d82e5728f8ffcbbe6a2d3a53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Approximate computing</topic><topic>Artificial neural networks</topic><topic>Chips (memory devices)</topic><topic>Computer architecture</topic><topic>Energy efficiency</topic><topic>energy-efficient design</topic><topic>Error analysis</topic><topic>Feature maps</topic><topic>Hardware</topic><topic>Internet of Things</topic><topic>Internet of Things (IoT)</topic><topic>layerwise quantization</topic><topic>Microprocessors</topic><topic>neural network (NN) processor</topic><topic>Neural networks</topic><topic>Power demand</topic><topic>Power management</topic><topic>Quantization (signal)</topic><topic>System-on-chip</topic><topic>Tensors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Zeju</creatorcontrib><creatorcontrib>Wang, Qinfan</creatorcontrib><creatorcontrib>Zou, Zihan</creatorcontrib><creatorcontrib>Shen, Qiao</creatorcontrib><creatorcontrib>Xie, Na</creatorcontrib><creatorcontrib>Cai, Hao</creatorcontrib><creatorcontrib>Zhang, Hao</creatorcontrib><creatorcontrib>Liu, Bo</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library Online</collection><collection>CrossRef</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection><jtitle>IEEE transactions on very large scale integration (VLSI) systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Li, Zeju</au><au>Wang, Qinfan</au><au>Zou, Zihan</au><au>Shen, Qiao</au><au>Xie, Na</au><au>Cai, Hao</au><au>Zhang, Hao</au><au>Liu, Bo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Layer-Sensitive Neural Processing Architecture for Error-Tolerant Applications</atitle><jtitle>IEEE transactions on very large scale integration (VLSI) systems</jtitle><stitle>TVLSI</stitle><date>2024-05-01</date><risdate>2024</risdate><volume>32</volume><issue>5</issue><spage>797</spage><epage>809</epage><pages>797-809</pages><issn>1063-8210</issn><eissn>1557-9999</eissn><coden>IEVSE9</coden><abstract>Neural network (NN) operation has high requirements for storage resources and parallel computing, which bring huge challenges to the deployment of NNs in Internet-of-Things (IoT) devices. Consequently, this work proposed a low-power NN architecture, comprising an energy-efficient NN processor and a Cortex-M3 host processor to achieve state-of-the-art (SOTA) end-to-end inference at the edge. The innovations of this article are as follows: 1) to minimize the bit width of the weight while keeping the loss of accuracy within a small range, cross-layer error tolerance has been analyzed, and mixed precision quantization has been adopted for cross-layer mapping; 2) dynamic reconfigurable tensor processing unit (DR-TPU) with approximate computing has been proposed, which brings <inline-formula> <tex-math notation="LaTeX">1.45\times </tex-math></inline-formula> computing energy reduction within 0.46% accurate loss in ResNet-50; and 3) a customized input feature map (IFM) reuse and over-writeback strategy has been adopted, eliminating the recurrent fetching from the on-chip and off-chip memories. The times of on-chip storage access can be reduced by 25%-60%, and the capacity of on-chip memory can be reduced to half of the original. The processor has been implemented at 28-nm CMOS technology. Combining the above work, the proposed architecture can achieve a 53.1% reduction of power and 17.2-TOPS/W energy efficiency.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TVLSI.2024.3369648</doi><tpages>13</tpages><orcidid>https://orcid.org/0009-0009-1563-4499</orcidid><orcidid>https://orcid.org/0009-0002-1510-1341</orcidid><orcidid>https://orcid.org/0009-0001-5846-1294</orcidid><orcidid>https://orcid.org/0000-0002-0894-1054</orcidid><orcidid>https://orcid.org/0000-0001-9794-8049</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1063-8210 |
ispartof | IEEE transactions on very large scale integration (VLSI) systems, 2024-05, Vol.32 (5), p.797-809 |
issn | 1063-8210 1557-9999 |
language | eng |
recordid | cdi_crossref_primary_10_1109_TVLSI_2024_3369648 |
source | IEEE Electronic Library (IEL) Journals |
subjects | Approximate computing Artificial neural networks Chips (memory devices) Computer architecture Energy efficiency energy-efficient design Error analysis Feature maps Hardware Internet of Things Internet of Things (IoT) layerwise quantization Microprocessors neural network (NN) processor Neural networks Power demand Power management Quantization (signal) System-on-chip Tensors |
title | Layer-Sensitive Neural Processing Architecture for Error-Tolerant Applications |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-23T12%3A01%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Layer-Sensitive%20Neural%20Processing%20Architecture%20for%20Error-Tolerant%20Applications&rft.jtitle=IEEE%20transactions%20on%20very%20large%20scale%20integration%20(VLSI)%20systems&rft.au=Li,%20Zeju&rft.date=2024-05-01&rft.volume=32&rft.issue=5&rft.spage=797&rft.epage=809&rft.pages=797-809&rft.issn=1063-8210&rft.eissn=1557-9999&rft.coden=IEVSE9&rft_id=info:doi/10.1109/TVLSI.2024.3369648&rft_dat=%3Cproquest_cross%3E3046908982%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c247t-a986758ce2f9f84baa4cc5a1bc96c879eda6f14c8d82e5728f8ffcbbe6a2d3a53%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3046908982&rft_id=info:pmid/&rft_ieee_id=10458963&rfr_iscdi=true |