Loading…

Power optimization through peripheral circuit reusing integrated with loop tiling for RRAM crossbar-based CNN

Convolutional neural networks (CNNs) have been proposed to be widely adopted to make predictions on a large amount of data in modern embedded systems. Prior studies have shown that convolutional computations which consist of numbers of multiply and accumulate (MAC) operations, serve as the most comp...

Full description

Saved in:
Bibliographic Details
Main Authors: Ni, Yuanhui, Chen, Weiwen, Cui, Wenjuan, Zhou, Yuanchun, Qiu, Keni
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 1186
container_issue
container_start_page 1183
container_title
container_volume
creator Ni, Yuanhui
Chen, Weiwen
Cui, Wenjuan
Zhou, Yuanchun
Qiu, Keni
description Convolutional neural networks (CNNs) have been proposed to be widely adopted to make predictions on a large amount of data in modern embedded systems. Prior studies have shown that convolutional computations which consist of numbers of multiply and accumulate (MAC) operations, serve as the most computationally expensive portion in CNN. Compared to the manner of executing MAC operations in GPU and FPGA, CNN implementation in the RRAM crossbar-based computing system (RCS) demonstrates the outstanding advantages of high performance and low power. However, the current design is energy-unbalanced among the three parts of RRAM crossbar computation, peripheral circuits and memory accesses, the latter two factors can significantly limit the potential gains of RCS. Addressing the problem of high power overhead of peripheral circuits in RCS, the Peripheral Circuit Unit (PeriCU)-Reuse scheme has been proposed to meet given power budget. In this paper, it is further observed that memory accesses can be bypassed if two adjacent layers are assigned in different PeriCUs. In this way, memory accesses can be reduced and thus the performance and power can be improved. A loop tiling technique is proposed to save memory accesses. The experiments of two convolutional applications validate that the proposed loop tiling technique can reduce energy consumption by 61.7%.
doi_str_mv 10.23919/DATE.2018.8342193
format conference_proceeding
fullrecord <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_8342193</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8342193</ieee_id><sourcerecordid>8342193</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-4c2ec5135d7009b22ae6d5d20b9e18e925e2c48cfd9d23924800c337a099fc153</originalsourceid><addsrcrecordid>eNotkMtOwzAUBQ0SEqX0B2DjH0jxI258l1UoD6kUVJV15Tg3jVEaR7arCr6eIrqaxRmdxRByx9lUSODw8DjfLKaCcT3VMhcc5AWZQKElaA5iJhlckhFXSmecM35NbmL8YowpKWBE9h_-iIH6Ibm9-zHJ-Z6mNvjDrqUDBje0GExHrQv24BINeIiu31HXJ9wFk7CmR5da2nk_0OS6v63xga7X8zdqg4-xMiGrTDyJ5Wp1S64a00WcnDkmn0-LTfmSLd-fX8v5MnO8UCnLrUCruFR1wRhUQhic1aoWrALkGkEoFDbXtqmhPiUQuWbMSlkYBtBYruSY3P__OkTcDsHtTfjenuvIX2jIWmc</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Power optimization through peripheral circuit reusing integrated with loop tiling for RRAM crossbar-based CNN</title><source>IEEE Xplore All Conference Series</source><creator>Ni, Yuanhui ; Chen, Weiwen ; Cui, Wenjuan ; Zhou, Yuanchun ; Qiu, Keni</creator><creatorcontrib>Ni, Yuanhui ; Chen, Weiwen ; Cui, Wenjuan ; Zhou, Yuanchun ; Qiu, Keni</creatorcontrib><description>Convolutional neural networks (CNNs) have been proposed to be widely adopted to make predictions on a large amount of data in modern embedded systems. Prior studies have shown that convolutional computations which consist of numbers of multiply and accumulate (MAC) operations, serve as the most computationally expensive portion in CNN. Compared to the manner of executing MAC operations in GPU and FPGA, CNN implementation in the RRAM crossbar-based computing system (RCS) demonstrates the outstanding advantages of high performance and low power. However, the current design is energy-unbalanced among the three parts of RRAM crossbar computation, peripheral circuits and memory accesses, the latter two factors can significantly limit the potential gains of RCS. Addressing the problem of high power overhead of peripheral circuits in RCS, the Peripheral Circuit Unit (PeriCU)-Reuse scheme has been proposed to meet given power budget. In this paper, it is further observed that memory accesses can be bypassed if two adjacent layers are assigned in different PeriCUs. In this way, memory accesses can be reduced and thus the performance and power can be improved. A loop tiling technique is proposed to save memory accesses. The experiments of two convolutional applications validate that the proposed loop tiling technique can reduce energy consumption by 61.7%.</description><identifier>EISSN: 1558-1101</identifier><identifier>EISBN: 9783981926309</identifier><identifier>EISBN: 3981926307</identifier><identifier>DOI: 10.23919/DATE.2018.8342193</identifier><language>eng</language><publisher>EDAA</publisher><subject>Data transfer ; Embedded systems ; Energy consumption ; Field programmable gate arrays ; Kernel ; Power demand ; Schedules</subject><ispartof>2018 Design, Automation &amp; Test in Europe Conference &amp; Exhibition (DATE), 2018, p.1183-1186</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8342193$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,27904,54534,54911</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8342193$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Ni, Yuanhui</creatorcontrib><creatorcontrib>Chen, Weiwen</creatorcontrib><creatorcontrib>Cui, Wenjuan</creatorcontrib><creatorcontrib>Zhou, Yuanchun</creatorcontrib><creatorcontrib>Qiu, Keni</creatorcontrib><title>Power optimization through peripheral circuit reusing integrated with loop tiling for RRAM crossbar-based CNN</title><title>2018 Design, Automation &amp; Test in Europe Conference &amp; Exhibition (DATE)</title><addtitle>DATE</addtitle><description>Convolutional neural networks (CNNs) have been proposed to be widely adopted to make predictions on a large amount of data in modern embedded systems. Prior studies have shown that convolutional computations which consist of numbers of multiply and accumulate (MAC) operations, serve as the most computationally expensive portion in CNN. Compared to the manner of executing MAC operations in GPU and FPGA, CNN implementation in the RRAM crossbar-based computing system (RCS) demonstrates the outstanding advantages of high performance and low power. However, the current design is energy-unbalanced among the three parts of RRAM crossbar computation, peripheral circuits and memory accesses, the latter two factors can significantly limit the potential gains of RCS. Addressing the problem of high power overhead of peripheral circuits in RCS, the Peripheral Circuit Unit (PeriCU)-Reuse scheme has been proposed to meet given power budget. In this paper, it is further observed that memory accesses can be bypassed if two adjacent layers are assigned in different PeriCUs. In this way, memory accesses can be reduced and thus the performance and power can be improved. A loop tiling technique is proposed to save memory accesses. The experiments of two convolutional applications validate that the proposed loop tiling technique can reduce energy consumption by 61.7%.</description><subject>Data transfer</subject><subject>Embedded systems</subject><subject>Energy consumption</subject><subject>Field programmable gate arrays</subject><subject>Kernel</subject><subject>Power demand</subject><subject>Schedules</subject><issn>1558-1101</issn><isbn>9783981926309</isbn><isbn>3981926307</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2018</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotkMtOwzAUBQ0SEqX0B2DjH0jxI258l1UoD6kUVJV15Tg3jVEaR7arCr6eIrqaxRmdxRByx9lUSODw8DjfLKaCcT3VMhcc5AWZQKElaA5iJhlckhFXSmecM35NbmL8YowpKWBE9h_-iIH6Ibm9-zHJ-Z6mNvjDrqUDBje0GExHrQv24BINeIiu31HXJ9wFk7CmR5da2nk_0OS6v63xga7X8zdqg4-xMiGrTDyJ5Wp1S64a00WcnDkmn0-LTfmSLd-fX8v5MnO8UCnLrUCruFR1wRhUQhic1aoWrALkGkEoFDbXtqmhPiUQuWbMSlkYBtBYruSY3P__OkTcDsHtTfjenuvIX2jIWmc</recordid><startdate>201803</startdate><enddate>201803</enddate><creator>Ni, Yuanhui</creator><creator>Chen, Weiwen</creator><creator>Cui, Wenjuan</creator><creator>Zhou, Yuanchun</creator><creator>Qiu, Keni</creator><general>EDAA</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>201803</creationdate><title>Power optimization through peripheral circuit reusing integrated with loop tiling for RRAM crossbar-based CNN</title><author>Ni, Yuanhui ; Chen, Weiwen ; Cui, Wenjuan ; Zhou, Yuanchun ; Qiu, Keni</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-4c2ec5135d7009b22ae6d5d20b9e18e925e2c48cfd9d23924800c337a099fc153</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Data transfer</topic><topic>Embedded systems</topic><topic>Energy consumption</topic><topic>Field programmable gate arrays</topic><topic>Kernel</topic><topic>Power demand</topic><topic>Schedules</topic><toplevel>online_resources</toplevel><creatorcontrib>Ni, Yuanhui</creatorcontrib><creatorcontrib>Chen, Weiwen</creatorcontrib><creatorcontrib>Cui, Wenjuan</creatorcontrib><creatorcontrib>Zhou, Yuanchun</creatorcontrib><creatorcontrib>Qiu, Keni</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Xplore</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ni, Yuanhui</au><au>Chen, Weiwen</au><au>Cui, Wenjuan</au><au>Zhou, Yuanchun</au><au>Qiu, Keni</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Power optimization through peripheral circuit reusing integrated with loop tiling for RRAM crossbar-based CNN</atitle><btitle>2018 Design, Automation &amp; Test in Europe Conference &amp; Exhibition (DATE)</btitle><stitle>DATE</stitle><date>2018-03</date><risdate>2018</risdate><spage>1183</spage><epage>1186</epage><pages>1183-1186</pages><eissn>1558-1101</eissn><eisbn>9783981926309</eisbn><eisbn>3981926307</eisbn><abstract>Convolutional neural networks (CNNs) have been proposed to be widely adopted to make predictions on a large amount of data in modern embedded systems. Prior studies have shown that convolutional computations which consist of numbers of multiply and accumulate (MAC) operations, serve as the most computationally expensive portion in CNN. Compared to the manner of executing MAC operations in GPU and FPGA, CNN implementation in the RRAM crossbar-based computing system (RCS) demonstrates the outstanding advantages of high performance and low power. However, the current design is energy-unbalanced among the three parts of RRAM crossbar computation, peripheral circuits and memory accesses, the latter two factors can significantly limit the potential gains of RCS. Addressing the problem of high power overhead of peripheral circuits in RCS, the Peripheral Circuit Unit (PeriCU)-Reuse scheme has been proposed to meet given power budget. In this paper, it is further observed that memory accesses can be bypassed if two adjacent layers are assigned in different PeriCUs. In this way, memory accesses can be reduced and thus the performance and power can be improved. A loop tiling technique is proposed to save memory accesses. The experiments of two convolutional applications validate that the proposed loop tiling technique can reduce energy consumption by 61.7%.</abstract><pub>EDAA</pub><doi>10.23919/DATE.2018.8342193</doi><tpages>4</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier EISSN: 1558-1101
ispartof 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2018, p.1183-1186
issn 1558-1101
language eng
recordid cdi_ieee_primary_8342193
source IEEE Xplore All Conference Series
subjects Data transfer
Embedded systems
Energy consumption
Field programmable gate arrays
Kernel
Power demand
Schedules
title Power optimization through peripheral circuit reusing integrated with loop tiling for RRAM crossbar-based CNN
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T13%3A56%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Power%20optimization%20through%20peripheral%20circuit%20reusing%20integrated%20with%20loop%20tiling%20for%20RRAM%20crossbar-based%20CNN&rft.btitle=2018%20Design,%20Automation%20&%20Test%20in%20Europe%20Conference%20&%20Exhibition%20(DATE)&rft.au=Ni,%20Yuanhui&rft.date=2018-03&rft.spage=1183&rft.epage=1186&rft.pages=1183-1186&rft.eissn=1558-1101&rft_id=info:doi/10.23919/DATE.2018.8342193&rft.eisbn=9783981926309&rft.eisbn_list=3981926307&rft_dat=%3Cieee_CHZPO%3E8342193%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i175t-4c2ec5135d7009b22ae6d5d20b9e18e925e2c48cfd9d23924800c337a099fc153%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=8342193&rfr_iscdi=true