Loading…
Power optimization through peripheral circuit reusing integrated with loop tiling for RRAM crossbar-based CNN
Convolutional neural networks (CNNs) have been proposed to be widely adopted to make predictions on a large amount of data in modern embedded systems. Prior studies have shown that convolutional computations which consist of numbers of multiply and accumulate (MAC) operations, serve as the most comp...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | 1186 |
container_issue | |
container_start_page | 1183 |
container_title | |
container_volume | |
creator | Ni, Yuanhui Chen, Weiwen Cui, Wenjuan Zhou, Yuanchun Qiu, Keni |
description | Convolutional neural networks (CNNs) have been proposed to be widely adopted to make predictions on a large amount of data in modern embedded systems. Prior studies have shown that convolutional computations which consist of numbers of multiply and accumulate (MAC) operations, serve as the most computationally expensive portion in CNN. Compared to the manner of executing MAC operations in GPU and FPGA, CNN implementation in the RRAM crossbar-based computing system (RCS) demonstrates the outstanding advantages of high performance and low power. However, the current design is energy-unbalanced among the three parts of RRAM crossbar computation, peripheral circuits and memory accesses, the latter two factors can significantly limit the potential gains of RCS. Addressing the problem of high power overhead of peripheral circuits in RCS, the Peripheral Circuit Unit (PeriCU)-Reuse scheme has been proposed to meet given power budget. In this paper, it is further observed that memory accesses can be bypassed if two adjacent layers are assigned in different PeriCUs. In this way, memory accesses can be reduced and thus the performance and power can be improved. A loop tiling technique is proposed to save memory accesses. The experiments of two convolutional applications validate that the proposed loop tiling technique can reduce energy consumption by 61.7%. |
doi_str_mv | 10.23919/DATE.2018.8342193 |
format | conference_proceeding |
fullrecord | <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_8342193</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8342193</ieee_id><sourcerecordid>8342193</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-4c2ec5135d7009b22ae6d5d20b9e18e925e2c48cfd9d23924800c337a099fc153</originalsourceid><addsrcrecordid>eNotkMtOwzAUBQ0SEqX0B2DjH0jxI258l1UoD6kUVJV15Tg3jVEaR7arCr6eIrqaxRmdxRByx9lUSODw8DjfLKaCcT3VMhcc5AWZQKElaA5iJhlckhFXSmecM35NbmL8YowpKWBE9h_-iIH6Ibm9-zHJ-Z6mNvjDrqUDBje0GExHrQv24BINeIiu31HXJ9wFk7CmR5da2nk_0OS6v63xga7X8zdqg4-xMiGrTDyJ5Wp1S64a00WcnDkmn0-LTfmSLd-fX8v5MnO8UCnLrUCruFR1wRhUQhic1aoWrALkGkEoFDbXtqmhPiUQuWbMSlkYBtBYruSY3P__OkTcDsHtTfjenuvIX2jIWmc</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Power optimization through peripheral circuit reusing integrated with loop tiling for RRAM crossbar-based CNN</title><source>IEEE Xplore All Conference Series</source><creator>Ni, Yuanhui ; Chen, Weiwen ; Cui, Wenjuan ; Zhou, Yuanchun ; Qiu, Keni</creator><creatorcontrib>Ni, Yuanhui ; Chen, Weiwen ; Cui, Wenjuan ; Zhou, Yuanchun ; Qiu, Keni</creatorcontrib><description>Convolutional neural networks (CNNs) have been proposed to be widely adopted to make predictions on a large amount of data in modern embedded systems. Prior studies have shown that convolutional computations which consist of numbers of multiply and accumulate (MAC) operations, serve as the most computationally expensive portion in CNN. Compared to the manner of executing MAC operations in GPU and FPGA, CNN implementation in the RRAM crossbar-based computing system (RCS) demonstrates the outstanding advantages of high performance and low power. However, the current design is energy-unbalanced among the three parts of RRAM crossbar computation, peripheral circuits and memory accesses, the latter two factors can significantly limit the potential gains of RCS. Addressing the problem of high power overhead of peripheral circuits in RCS, the Peripheral Circuit Unit (PeriCU)-Reuse scheme has been proposed to meet given power budget. In this paper, it is further observed that memory accesses can be bypassed if two adjacent layers are assigned in different PeriCUs. In this way, memory accesses can be reduced and thus the performance and power can be improved. A loop tiling technique is proposed to save memory accesses. The experiments of two convolutional applications validate that the proposed loop tiling technique can reduce energy consumption by 61.7%.</description><identifier>EISSN: 1558-1101</identifier><identifier>EISBN: 9783981926309</identifier><identifier>EISBN: 3981926307</identifier><identifier>DOI: 10.23919/DATE.2018.8342193</identifier><language>eng</language><publisher>EDAA</publisher><subject>Data transfer ; Embedded systems ; Energy consumption ; Field programmable gate arrays ; Kernel ; Power demand ; Schedules</subject><ispartof>2018 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2018, p.1183-1186</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8342193$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,27904,54534,54911</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8342193$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Ni, Yuanhui</creatorcontrib><creatorcontrib>Chen, Weiwen</creatorcontrib><creatorcontrib>Cui, Wenjuan</creatorcontrib><creatorcontrib>Zhou, Yuanchun</creatorcontrib><creatorcontrib>Qiu, Keni</creatorcontrib><title>Power optimization through peripheral circuit reusing integrated with loop tiling for RRAM crossbar-based CNN</title><title>2018 Design, Automation & Test in Europe Conference & Exhibition (DATE)</title><addtitle>DATE</addtitle><description>Convolutional neural networks (CNNs) have been proposed to be widely adopted to make predictions on a large amount of data in modern embedded systems. Prior studies have shown that convolutional computations which consist of numbers of multiply and accumulate (MAC) operations, serve as the most computationally expensive portion in CNN. Compared to the manner of executing MAC operations in GPU and FPGA, CNN implementation in the RRAM crossbar-based computing system (RCS) demonstrates the outstanding advantages of high performance and low power. However, the current design is energy-unbalanced among the three parts of RRAM crossbar computation, peripheral circuits and memory accesses, the latter two factors can significantly limit the potential gains of RCS. Addressing the problem of high power overhead of peripheral circuits in RCS, the Peripheral Circuit Unit (PeriCU)-Reuse scheme has been proposed to meet given power budget. In this paper, it is further observed that memory accesses can be bypassed if two adjacent layers are assigned in different PeriCUs. In this way, memory accesses can be reduced and thus the performance and power can be improved. A loop tiling technique is proposed to save memory accesses. The experiments of two convolutional applications validate that the proposed loop tiling technique can reduce energy consumption by 61.7%.</description><subject>Data transfer</subject><subject>Embedded systems</subject><subject>Energy consumption</subject><subject>Field programmable gate arrays</subject><subject>Kernel</subject><subject>Power demand</subject><subject>Schedules</subject><issn>1558-1101</issn><isbn>9783981926309</isbn><isbn>3981926307</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2018</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotkMtOwzAUBQ0SEqX0B2DjH0jxI258l1UoD6kUVJV15Tg3jVEaR7arCr6eIrqaxRmdxRByx9lUSODw8DjfLKaCcT3VMhcc5AWZQKElaA5iJhlckhFXSmecM35NbmL8YowpKWBE9h_-iIH6Ibm9-zHJ-Z6mNvjDrqUDBje0GExHrQv24BINeIiu31HXJ9wFk7CmR5da2nk_0OS6v63xga7X8zdqg4-xMiGrTDyJ5Wp1S64a00WcnDkmn0-LTfmSLd-fX8v5MnO8UCnLrUCruFR1wRhUQhic1aoWrALkGkEoFDbXtqmhPiUQuWbMSlkYBtBYruSY3P__OkTcDsHtTfjenuvIX2jIWmc</recordid><startdate>201803</startdate><enddate>201803</enddate><creator>Ni, Yuanhui</creator><creator>Chen, Weiwen</creator><creator>Cui, Wenjuan</creator><creator>Zhou, Yuanchun</creator><creator>Qiu, Keni</creator><general>EDAA</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>201803</creationdate><title>Power optimization through peripheral circuit reusing integrated with loop tiling for RRAM crossbar-based CNN</title><author>Ni, Yuanhui ; Chen, Weiwen ; Cui, Wenjuan ; Zhou, Yuanchun ; Qiu, Keni</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-4c2ec5135d7009b22ae6d5d20b9e18e925e2c48cfd9d23924800c337a099fc153</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Data transfer</topic><topic>Embedded systems</topic><topic>Energy consumption</topic><topic>Field programmable gate arrays</topic><topic>Kernel</topic><topic>Power demand</topic><topic>Schedules</topic><toplevel>online_resources</toplevel><creatorcontrib>Ni, Yuanhui</creatorcontrib><creatorcontrib>Chen, Weiwen</creatorcontrib><creatorcontrib>Cui, Wenjuan</creatorcontrib><creatorcontrib>Zhou, Yuanchun</creatorcontrib><creatorcontrib>Qiu, Keni</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Xplore</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ni, Yuanhui</au><au>Chen, Weiwen</au><au>Cui, Wenjuan</au><au>Zhou, Yuanchun</au><au>Qiu, Keni</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Power optimization through peripheral circuit reusing integrated with loop tiling for RRAM crossbar-based CNN</atitle><btitle>2018 Design, Automation & Test in Europe Conference & Exhibition (DATE)</btitle><stitle>DATE</stitle><date>2018-03</date><risdate>2018</risdate><spage>1183</spage><epage>1186</epage><pages>1183-1186</pages><eissn>1558-1101</eissn><eisbn>9783981926309</eisbn><eisbn>3981926307</eisbn><abstract>Convolutional neural networks (CNNs) have been proposed to be widely adopted to make predictions on a large amount of data in modern embedded systems. Prior studies have shown that convolutional computations which consist of numbers of multiply and accumulate (MAC) operations, serve as the most computationally expensive portion in CNN. Compared to the manner of executing MAC operations in GPU and FPGA, CNN implementation in the RRAM crossbar-based computing system (RCS) demonstrates the outstanding advantages of high performance and low power. However, the current design is energy-unbalanced among the three parts of RRAM crossbar computation, peripheral circuits and memory accesses, the latter two factors can significantly limit the potential gains of RCS. Addressing the problem of high power overhead of peripheral circuits in RCS, the Peripheral Circuit Unit (PeriCU)-Reuse scheme has been proposed to meet given power budget. In this paper, it is further observed that memory accesses can be bypassed if two adjacent layers are assigned in different PeriCUs. In this way, memory accesses can be reduced and thus the performance and power can be improved. A loop tiling technique is proposed to save memory accesses. The experiments of two convolutional applications validate that the proposed loop tiling technique can reduce energy consumption by 61.7%.</abstract><pub>EDAA</pub><doi>10.23919/DATE.2018.8342193</doi><tpages>4</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | EISSN: 1558-1101 |
ispartof | 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2018, p.1183-1186 |
issn | 1558-1101 |
language | eng |
recordid | cdi_ieee_primary_8342193 |
source | IEEE Xplore All Conference Series |
subjects | Data transfer Embedded systems Energy consumption Field programmable gate arrays Kernel Power demand Schedules |
title | Power optimization through peripheral circuit reusing integrated with loop tiling for RRAM crossbar-based CNN |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T13%3A56%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Power%20optimization%20through%20peripheral%20circuit%20reusing%20integrated%20with%20loop%20tiling%20for%20RRAM%20crossbar-based%20CNN&rft.btitle=2018%20Design,%20Automation%20&%20Test%20in%20Europe%20Conference%20&%20Exhibition%20(DATE)&rft.au=Ni,%20Yuanhui&rft.date=2018-03&rft.spage=1183&rft.epage=1186&rft.pages=1183-1186&rft.eissn=1558-1101&rft_id=info:doi/10.23919/DATE.2018.8342193&rft.eisbn=9783981926309&rft.eisbn_list=3981926307&rft_dat=%3Cieee_CHZPO%3E8342193%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i175t-4c2ec5135d7009b22ae6d5d20b9e18e925e2c48cfd9d23924800c337a099fc153%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=8342193&rfr_iscdi=true |