Loading…

Layer-Sensitive Neural Processing Architecture for Error-Tolerant Applications

Neural network (NN) operation has high requirements for storage resources and parallel computing, which bring huge challenges to the deployment of NNs in Internet-of-Things (IoT) devices. Consequently, this work proposed a low-power NN architecture, comprising an energy-efficient NN processor and a...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on very large scale integration (VLSI) systems 2024-05, Vol.32 (5), p.797-809
Main Authors:	Li, Zeju, Wang, Qinfan, Zou, Zihan, Shen, Qiao, Xie, Na, Cai, Hao, Zhang, Hao, Liu, Bo
Format:	Article
Language:	English
Subjects:	Approximate computing Artificial neural networks Chips (memory devices) Computer architecture Energy efficiency energy-efficient design Error analysis Feature maps Hardware Internet of Things Internet of Things (IoT) layerwise quantization Microprocessors neural network (NN) processor Neural networks Power demand Power management Quantization (signal) System-on-chip Tensors
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites	cdi_FETCH-LOGICAL-c247t-a986758ce2f9f84baa4cc5a1bc96c879eda6f14c8d82e5728f8ffcbbe6a2d3a53
container_end_page	809
container_issue	5
container_start_page	797
container_title	IEEE transactions on very large scale integration (VLSI) systems
container_volume	32
creator	Li, Zeju Wang, Qinfan Zou, Zihan Shen, Qiao Xie, Na Cai, Hao Zhang, Hao Liu, Bo
description	Neural network (NN) operation has high requirements for storage resources and parallel computing, which bring huge challenges to the deployment of NNs in Internet-of-Things (IoT) devices. Consequently, this work proposed a low-power NN architecture, comprising an energy-efficient NN processor and a Cortex-M3 host processor to achieve state-of-the-art (SOTA) end-to-end inference at the edge. The innovations of this article are as follows: 1) to minimize the bit width of the weight while keeping the loss of accuracy within a small range, cross-layer error tolerance has been analyzed, and mixed precision quantization has been adopted for cross-layer mapping; 2) dynamic reconfigurable tensor processing unit (DR-TPU) with approximate computing has been proposed, which brings 1.45\times computing energy reduction within 0.46% accurate loss in ResNet-50; and 3) a customized input feature map (IFM) reuse and over-writeback strategy has been adopted, eliminating the recurrent fetching from the on-chip and off-chip memories. The times of on-chip storage access can be reduced by 25%-60%, and the capacity of on-chip memory can be reduced to half of the original. The processor has been implemented at 28-nm CMOS technology. Combining the above work, the proposed architecture can achieve a 53.1% reduction of power and 17.2-TOPS/W energy efficiency.
doi_str_mv	10.1109/TVLSI.2024.3369648
format	article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TVLSI_2024_3369648</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10458963</ieee_id><sourcerecordid>3046908982</sourcerecordid><originalsourceid>FETCH-LOGICAL-c247t-a986758ce2f9f84baa4cc5a1bc96c879eda6f14c8d82e5728f8ffcbbe6a2d3a53</originalsourceid><addsrcrecordid>eNpNkE1LAzEQhoMoWKt_QDwseN6ar80mx1KqFkoVWr2GbDrRlHWzJrtC_71b24NzeefwPjPwIHRL8IQQrB4278v1YkIx5RPGhBJcnqERKYoyV8OcDzsWLJeU4Et0ldIOY8K5wiO0Wpo9xHwNTfKd_4FsBX00dfYag4WUfPORTaP99B3Yro-QuRCzeYwh5ptQQzRNl03btvbWdD406RpdOFMnuDnlGL09zjez53z58rSYTZe5pbzscqOkKAtpgTrlJK-M4dYWhlRWCStLBVsjHOFWbiWFoqTSSedsVYEwdMtMwcbo_ni3jeG7h9TpXehjM7zUDHOhsFSSDi16bNkYUorgdBv9l4l7TbA-eNN_3vTBmz55G6C7I-QB4B_AC6kEY78XomvU</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3046908982</pqid></control><display><type>article</type><title>Layer-Sensitive Neural Processing Architecture for Error-Tolerant Applications</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Li, Zeju ; Wang, Qinfan ; Zou, Zihan ; Shen, Qiao ; Xie, Na ; Cai, Hao ; Zhang, Hao ; Liu, Bo</creator><creatorcontrib>Li, Zeju ; Wang, Qinfan ; Zou, Zihan ; Shen, Qiao ; Xie, Na ; Cai, Hao ; Zhang, Hao ; Liu, Bo</creatorcontrib><description>Neural network (NN) operation has high requirements for storage resources and parallel computing, which bring huge challenges to the deployment of NNs in Internet-of-Things (IoT) devices. Consequently, this work proposed a low-power NN architecture, comprising an energy-efficient NN processor and a Cortex-M3 host processor to achieve state-of-the-art (SOTA) end-to-end inference at the edge. The innovations of this article are as follows: 1) to minimize the bit width of the weight while keeping the loss of accuracy within a small range, cross-layer error tolerance has been analyzed, and mixed precision quantization has been adopted for cross-layer mapping; 2) dynamic reconfigurable tensor processing unit (DR-TPU) with approximate computing has been proposed, which brings <inline-formula> <tex-math notation="LaTeX">1.45\times </tex-math></inline-formula> computing energy reduction within 0.46% accurate loss in ResNet-50; and 3) a customized input feature map (IFM) reuse and over-writeback strategy has been adopted, eliminating the recurrent fetching from the on-chip and off-chip memories. The times of on-chip storage access can be reduced by 25%-60%, and the capacity of on-chip memory can be reduced to half of the original. The processor has been implemented at 28-nm CMOS technology. Combining the above work, the proposed architecture can achieve a 53.1% reduction of power and 17.2-TOPS/W energy efficiency.</description><identifier>ISSN: 1063-8210</identifier><identifier>EISSN: 1557-9999</identifier><identifier>DOI: 10.1109/TVLSI.2024.3369648</identifier><identifier>CODEN: IEVSE9</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Approximate computing ; Artificial neural networks ; Chips (memory devices) ; Computer architecture ; Energy efficiency ; energy-efficient design ; Error analysis ; Feature maps ; Hardware ; Internet of Things ; Internet of Things (IoT) ; layerwise quantization ; Microprocessors ; neural network (NN) processor ; Neural networks ; Power demand ; Power management ; Quantization (signal) ; System-on-chip ; Tensors</subject><ispartof>IEEE transactions on very large scale integration (VLSI) systems, 2024-05, Vol.32 (5), p.797-809</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c247t-a986758ce2f9f84baa4cc5a1bc96c879eda6f14c8d82e5728f8ffcbbe6a2d3a53</cites><orcidid>0009-0009-1563-4499 ; 0009-0002-1510-1341 ; 0009-0001-5846-1294 ; 0000-0002-0894-1054 ; 0000-0001-9794-8049</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10458963$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,54796</link.rule.ids></links><search><creatorcontrib>Li, Zeju</creatorcontrib><creatorcontrib>Wang, Qinfan</creatorcontrib><creatorcontrib>Zou, Zihan</creatorcontrib><creatorcontrib>Shen, Qiao</creatorcontrib><creatorcontrib>Xie, Na</creatorcontrib><creatorcontrib>Cai, Hao</creatorcontrib><creatorcontrib>Zhang, Hao</creatorcontrib><creatorcontrib>Liu, Bo</creatorcontrib><title>Layer-Sensitive Neural Processing Architecture for Error-Tolerant Applications</title><title>IEEE transactions on very large scale integration (VLSI) systems</title><addtitle>TVLSI</addtitle><description>Neural network (NN) operation has high requirements for storage resources and parallel computing, which bring huge challenges to the deployment of NNs in Internet-of-Things (IoT) devices. Consequently, this work proposed a low-power NN architecture, comprising an energy-efficient NN processor and a Cortex-M3 host processor to achieve state-of-the-art (SOTA) end-to-end inference at the edge. The innovations of this article are as follows: 1) to minimize the bit width of the weight while keeping the loss of accuracy within a small range, cross-layer error tolerance has been analyzed, and mixed precision quantization has been adopted for cross-layer mapping; 2) dynamic reconfigurable tensor processing unit (DR-TPU) with approximate computing has been proposed, which brings <inline-formula> <tex-math notation="LaTeX">1.45\times </tex-math></inline-formula> computing energy reduction within 0.46% accurate loss in ResNet-50; and 3) a customized input feature map (IFM) reuse and over-writeback strategy has been adopted, eliminating the recurrent fetching from the on-chip and off-chip memories. The times of on-chip storage access can be reduced by 25%-60%, and the capacity of on-chip memory can be reduced to half of the original. The processor has been implemented at 28-nm CMOS technology. Combining the above work, the proposed architecture can achieve a 53.1% reduction of power and 17.2-TOPS/W energy efficiency.</description><subject>Approximate computing</subject><subject>Artificial neural networks</subject><subject>Chips (memory devices)</subject><subject>Computer architecture</subject><subject>Energy efficiency</subject><subject>energy-efficient design</subject><subject>Error analysis</subject><subject>Feature maps</subject><subject>Hardware</subject><subject>Internet of Things</subject><subject>Internet of Things (IoT)</subject><subject>layerwise quantization</subject><subject>Microprocessors</subject><subject>neural network (NN) processor</subject><subject>Neural networks</subject><subject>Power demand</subject><subject>Power management</subject><subject>Quantization (signal)</subject><subject>System-on-chip</subject><subject>Tensors</subject><issn>1063-8210</issn><issn>1557-9999</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNpNkE1LAzEQhoMoWKt_QDwseN6ar80mx1KqFkoVWr2GbDrRlHWzJrtC_71b24NzeefwPjPwIHRL8IQQrB4278v1YkIx5RPGhBJcnqERKYoyV8OcDzsWLJeU4Et0ldIOY8K5wiO0Wpo9xHwNTfKd_4FsBX00dfYag4WUfPORTaP99B3Yro-QuRCzeYwh5ptQQzRNl03btvbWdD406RpdOFMnuDnlGL09zjez53z58rSYTZe5pbzscqOkKAtpgTrlJK-M4dYWhlRWCStLBVsjHOFWbiWFoqTSSedsVYEwdMtMwcbo_ni3jeG7h9TpXehjM7zUDHOhsFSSDi16bNkYUorgdBv9l4l7TbA-eNN_3vTBmz55G6C7I-QB4B_AC6kEY78XomvU</recordid><startdate>20240501</startdate><enddate>20240501</enddate><creator>Li, Zeju</creator><creator>Wang, Qinfan</creator><creator>Zou, Zihan</creator><creator>Shen, Qiao</creator><creator>Xie, Na</creator><creator>Cai, Hao</creator><creator>Zhang, Hao</creator><creator>Liu, Bo</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>8FD</scope><scope>L7M</scope><orcidid>https://orcid.org/0009-0009-1563-4499</orcidid><orcidid>https://orcid.org/0009-0002-1510-1341</orcidid><orcidid>https://orcid.org/0009-0001-5846-1294</orcidid><orcidid>https://orcid.org/0000-0002-0894-1054</orcidid><orcidid>https://orcid.org/0000-0001-9794-8049</orcidid></search><sort><creationdate>20240501</creationdate><title>Layer-Sensitive Neural Processing Architecture for Error-Tolerant Applications</title><author>Li, Zeju ; Wang, Qinfan ; Zou, Zihan ; Shen, Qiao ; Xie, Na ; Cai, Hao ; Zhang, Hao ; Liu, Bo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c247t-a986758ce2f9f84baa4cc5a1bc96c879eda6f14c8d82e5728f8ffcbbe6a2d3a53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Approximate computing</topic><topic>Artificial neural networks</topic><topic>Chips (memory devices)</topic><topic>Computer architecture</topic><topic>Energy efficiency</topic><topic>energy-efficient design</topic><topic>Error analysis</topic><topic>Feature maps</topic><topic>Hardware</topic><topic>Internet of Things</topic><topic>Internet of Things (IoT)</topic><topic>layerwise quantization</topic><topic>Microprocessors</topic><topic>neural network (NN) processor</topic><topic>Neural networks</topic><topic>Power demand</topic><topic>Power management</topic><topic>Quantization (signal)</topic><topic>System-on-chip</topic><topic>Tensors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Zeju</creatorcontrib><creatorcontrib>Wang, Qinfan</creatorcontrib><creatorcontrib>Zou, Zihan</creatorcontrib><creatorcontrib>Shen, Qiao</creatorcontrib><creatorcontrib>Xie, Na</creatorcontrib><creatorcontrib>Cai, Hao</creatorcontrib><creatorcontrib>Zhang, Hao</creatorcontrib><creatorcontrib>Liu, Bo</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library Online</collection><collection>CrossRef</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection><jtitle>IEEE transactions on very large scale integration (VLSI) systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Li, Zeju</au><au>Wang, Qinfan</au><au>Zou, Zihan</au><au>Shen, Qiao</au><au>Xie, Na</au><au>Cai, Hao</au><au>Zhang, Hao</au><au>Liu, Bo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Layer-Sensitive Neural Processing Architecture for Error-Tolerant Applications</atitle><jtitle>IEEE transactions on very large scale integration (VLSI) systems</jtitle><stitle>TVLSI</stitle><date>2024-05-01</date><risdate>2024</risdate><volume>32</volume><issue>5</issue><spage>797</spage><epage>809</epage><pages>797-809</pages><issn>1063-8210</issn><eissn>1557-9999</eissn><coden>IEVSE9</coden><abstract>Neural network (NN) operation has high requirements for storage resources and parallel computing, which bring huge challenges to the deployment of NNs in Internet-of-Things (IoT) devices. Consequently, this work proposed a low-power NN architecture, comprising an energy-efficient NN processor and a Cortex-M3 host processor to achieve state-of-the-art (SOTA) end-to-end inference at the edge. The innovations of this article are as follows: 1) to minimize the bit width of the weight while keeping the loss of accuracy within a small range, cross-layer error tolerance has been analyzed, and mixed precision quantization has been adopted for cross-layer mapping; 2) dynamic reconfigurable tensor processing unit (DR-TPU) with approximate computing has been proposed, which brings <inline-formula> <tex-math notation="LaTeX">1.45\times </tex-math></inline-formula> computing energy reduction within 0.46% accurate loss in ResNet-50; and 3) a customized input feature map (IFM) reuse and over-writeback strategy has been adopted, eliminating the recurrent fetching from the on-chip and off-chip memories. The times of on-chip storage access can be reduced by 25%-60%, and the capacity of on-chip memory can be reduced to half of the original. The processor has been implemented at 28-nm CMOS technology. Combining the above work, the proposed architecture can achieve a 53.1% reduction of power and 17.2-TOPS/W energy efficiency.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TVLSI.2024.3369648</doi><tpages>13</tpages><orcidid>https://orcid.org/0009-0009-1563-4499</orcidid><orcidid>https://orcid.org/0009-0002-1510-1341</orcidid><orcidid>https://orcid.org/0009-0001-5846-1294</orcidid><orcidid>https://orcid.org/0000-0002-0894-1054</orcidid><orcidid>https://orcid.org/0000-0001-9794-8049</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 1063-8210
ispartof	IEEE transactions on very large scale integration (VLSI) systems, 2024-05, Vol.32 (5), p.797-809
issn	1063-8210 1557-9999
language	eng
recordid	cdi_crossref_primary_10_1109_TVLSI_2024_3369648
source	IEEE Electronic Library (IEL) Journals
subjects	Approximate computing Artificial neural networks Chips (memory devices) Computer architecture Energy efficiency energy-efficient design Error analysis Feature maps Hardware Internet of Things Internet of Things (IoT) layerwise quantization Microprocessors neural network (NN) processor Neural networks Power demand Power management Quantization (signal) System-on-chip Tensors
title	Layer-Sensitive Neural Processing Architecture for Error-Tolerant Applications
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-23T12%3A01%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Layer-Sensitive%20Neural%20Processing%20Architecture%20for%20Error-Tolerant%20Applications&rft.jtitle=IEEE%20transactions%20on%20very%20large%20scale%20integration%20(VLSI)%20systems&rft.au=Li,%20Zeju&rft.date=2024-05-01&rft.volume=32&rft.issue=5&rft.spage=797&rft.epage=809&rft.pages=797-809&rft.issn=1063-8210&rft.eissn=1557-9999&rft.coden=IEVSE9&rft_id=info:doi/10.1109/TVLSI.2024.3369648&rft_dat=%3Cproquest_cross%3E3046908982%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c247t-a986758ce2f9f84baa4cc5a1bc96c879eda6f14c8d82e5728f8ffcbbe6a2d3a53%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3046908982&rft_id=info:pmid/&rft_ieee_id=10458963&rfr_iscdi=true