Loading…
Automatic Generation and Optimization Framework of NoC-Based Neural Network Accelerator Through Reinforcement Learning
Choices of dataflows, which are known as intra-core neural network (NN) computation loop nest scheduling and inter-core hardware mapping strategies, play a critical role in the performance and energy efficiency of NoC-based neural network accelerators. Confronted with an enormous dataflow exploratio...
Saved in:
Published in: | IEEE transactions on computers 2024-12, Vol.73 (12), p.2882-2896 |
---|---|
Main Authors: | , , , , , , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | cdi_FETCH-LOGICAL-c146t-7e756f689abe6920e0fe982190a05d5f9a4baa08cb296bde68a2ce028aa4a9c23 |
container_end_page | 2896 |
container_issue | 12 |
container_start_page | 2882 |
container_title | IEEE transactions on computers |
container_volume | 73 |
creator | Xue, Yongqi Ji, Jinlun Yu, Xinming Zhou, Shize Li, Siyue Li, Xinyi Cheng, Tong Li, Shiping Chen, Kai Lu, Zhonghai Li, Li Fu, Yuxiang |
description | Choices of dataflows, which are known as intra-core neural network (NN) computation loop nest scheduling and inter-core hardware mapping strategies, play a critical role in the performance and energy efficiency of NoC-based neural network accelerators. Confronted with an enormous dataflow exploration space, this paper proposes an automatic framework for generating and optimizing the full-layer-mappings based on two reinforcement learning algorithms including A2C and PPO. Combining soft and hard constraints, this work transforms the mapping configuration into a sequential decision problem and aims to explore the performance and energy efficient hardware mapping for NoC systems. We evaluate the performance of the proposed framework on 10 experimental neural networks. The results show that compared with the direct-X mapping, the direct-Y mapping, GA-base mapping, and NN-aware mapping, our optimization framework reduces the average execution time of 10 experimental NNs by 9.09\% % , improves the throughput by 11.27\% % , reduces the energy by 12.62\% % , and reduces the time-energy-product (TEP) by 14.49\% % . The results also show that the performance enhancement is related to the coefficient of variation of the neural network to be computed. |
doi_str_mv | 10.1109/TC.2024.3441822 |
format | article |
fullrecord | <record><control><sourceid>crossref_ieee_</sourceid><recordid>TN_cdi_ieee_primary_10633899</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10633899</ieee_id><sourcerecordid>10_1109_TC_2024_3441822</sourcerecordid><originalsourceid>FETCH-LOGICAL-c146t-7e756f689abe6920e0fe982190a05d5f9a4baa08cb296bde68a2ce028aa4a9c23</originalsourceid><addsrcrecordid>eNpNkEFPwkAQhTdGExE9e_Gwf6Awu22X3SM2giYEElPPzXQ7hVXaJdui0V8vCAdP7-XNvHf4GLsXMBICzDjPRhJkMoqTRGgpL9hApOkkMiZVl2wAIHRk4gSu2U3XvQOAkmAG7HO6732DvbN8Ti2Fg_Mtx7biq13vGvdzCmYBG_ry4YP7mi99Fj1iRxVf0j7g9iD9321qLW2PGz7wfBP8fr3hr-Ta2gdLDbU9XxCG1rXrW3ZV47aju7MO2dvsKc-eo8Vq_pJNF5EVieqjCU1SVSttsCRlJBDUZLQUBhDSKq0NJiUiaFtKo8qKlEZpCaRGTNBYGQ_Z-LRrg--6QHWxC67B8F0IKI7YijwrjtiKM7ZD4-HUcET071vFsTYm_gXY9Gul</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Automatic Generation and Optimization Framework of NoC-Based Neural Network Accelerator Through Reinforcement Learning</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Xue, Yongqi ; Ji, Jinlun ; Yu, Xinming ; Zhou, Shize ; Li, Siyue ; Li, Xinyi ; Cheng, Tong ; Li, Shiping ; Chen, Kai ; Lu, Zhonghai ; Li, Li ; Fu, Yuxiang</creator><creatorcontrib>Xue, Yongqi ; Ji, Jinlun ; Yu, Xinming ; Zhou, Shize ; Li, Siyue ; Li, Xinyi ; Cheng, Tong ; Li, Shiping ; Chen, Kai ; Lu, Zhonghai ; Li, Li ; Fu, Yuxiang</creatorcontrib><description><![CDATA[Choices of dataflows, which are known as intra-core neural network (NN) computation loop nest scheduling and inter-core hardware mapping strategies, play a critical role in the performance and energy efficiency of NoC-based neural network accelerators. Confronted with an enormous dataflow exploration space, this paper proposes an automatic framework for generating and optimizing the full-layer-mappings based on two reinforcement learning algorithms including A2C and PPO. Combining soft and hard constraints, this work transforms the mapping configuration into a sequential decision problem and aims to explore the performance and energy efficient hardware mapping for NoC systems. We evaluate the performance of the proposed framework on 10 experimental neural networks. The results show that compared with the direct-X mapping, the direct-Y mapping, GA-base mapping, and NN-aware mapping, our optimization framework reduces the average execution time of 10 experimental NNs by 9.09<inline-formula><tex-math notation="LaTeX">\%</tex-math> <mml:math><mml:mi mathvariant="normal">%</mml:mi></mml:math><inline-graphic xlink:href="fu-ieq1-3441822.gif"/> </inline-formula>, improves the throughput by 11.27<inline-formula><tex-math notation="LaTeX">\%</tex-math> <mml:math><mml:mi mathvariant="normal">%</mml:mi></mml:math><inline-graphic xlink:href="fu-ieq2-3441822.gif"/> </inline-formula>, reduces the energy by 12.62<inline-formula><tex-math notation="LaTeX">\%</tex-math> <mml:math><mml:mi mathvariant="normal">%</mml:mi></mml:math><inline-graphic xlink:href="fu-ieq3-3441822.gif"/> </inline-formula>, and reduces the time-energy-product (TEP) by 14.49<inline-formula><tex-math notation="LaTeX">\%</tex-math> <mml:math><mml:mi mathvariant="normal">%</mml:mi></mml:math><inline-graphic xlink:href="fu-ieq4-3441822.gif"/> </inline-formula>. The results also show that the performance enhancement is related to the coefficient of variation of the neural network to be computed.]]></description><identifier>ISSN: 0018-9340</identifier><identifier>EISSN: 1557-9956</identifier><identifier>DOI: 10.1109/TC.2024.3441822</identifier><identifier>CODEN: ITCOB4</identifier><language>eng</language><publisher>IEEE</publisher><subject>Artificial neural networks ; Biological neural networks ; Computer architecture ; Hardware ; hardware mapping ; Network-on-chip ; Neural networks ; Reinforcement learning ; Task analysis</subject><ispartof>IEEE transactions on computers, 2024-12, Vol.73 (12), p.2882-2896</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c146t-7e756f689abe6920e0fe982190a05d5f9a4baa08cb296bde68a2ce028aa4a9c23</cites><orcidid>0000-0003-1351-5460 ; 0009-0007-2099-040X ; 0009-0007-5630-1514 ; 0009-0009-5260-5726 ; 0009-0003-2877-2387 ; 0000-0002-1047-6067 ; 0009-0007-9970-804X ; 0009-0004-4649-041X ; 0000-0001-8241-7536 ; 0009-0003-0152-9571 ; 0000-0003-0061-3475 ; 0009-0007-9023-0465</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10633899$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,54771</link.rule.ids></links><search><creatorcontrib>Xue, Yongqi</creatorcontrib><creatorcontrib>Ji, Jinlun</creatorcontrib><creatorcontrib>Yu, Xinming</creatorcontrib><creatorcontrib>Zhou, Shize</creatorcontrib><creatorcontrib>Li, Siyue</creatorcontrib><creatorcontrib>Li, Xinyi</creatorcontrib><creatorcontrib>Cheng, Tong</creatorcontrib><creatorcontrib>Li, Shiping</creatorcontrib><creatorcontrib>Chen, Kai</creatorcontrib><creatorcontrib>Lu, Zhonghai</creatorcontrib><creatorcontrib>Li, Li</creatorcontrib><creatorcontrib>Fu, Yuxiang</creatorcontrib><title>Automatic Generation and Optimization Framework of NoC-Based Neural Network Accelerator Through Reinforcement Learning</title><title>IEEE transactions on computers</title><addtitle>TC</addtitle><description><![CDATA[Choices of dataflows, which are known as intra-core neural network (NN) computation loop nest scheduling and inter-core hardware mapping strategies, play a critical role in the performance and energy efficiency of NoC-based neural network accelerators. Confronted with an enormous dataflow exploration space, this paper proposes an automatic framework for generating and optimizing the full-layer-mappings based on two reinforcement learning algorithms including A2C and PPO. Combining soft and hard constraints, this work transforms the mapping configuration into a sequential decision problem and aims to explore the performance and energy efficient hardware mapping for NoC systems. We evaluate the performance of the proposed framework on 10 experimental neural networks. The results show that compared with the direct-X mapping, the direct-Y mapping, GA-base mapping, and NN-aware mapping, our optimization framework reduces the average execution time of 10 experimental NNs by 9.09<inline-formula><tex-math notation="LaTeX">\%</tex-math> <mml:math><mml:mi mathvariant="normal">%</mml:mi></mml:math><inline-graphic xlink:href="fu-ieq1-3441822.gif"/> </inline-formula>, improves the throughput by 11.27<inline-formula><tex-math notation="LaTeX">\%</tex-math> <mml:math><mml:mi mathvariant="normal">%</mml:mi></mml:math><inline-graphic xlink:href="fu-ieq2-3441822.gif"/> </inline-formula>, reduces the energy by 12.62<inline-formula><tex-math notation="LaTeX">\%</tex-math> <mml:math><mml:mi mathvariant="normal">%</mml:mi></mml:math><inline-graphic xlink:href="fu-ieq3-3441822.gif"/> </inline-formula>, and reduces the time-energy-product (TEP) by 14.49<inline-formula><tex-math notation="LaTeX">\%</tex-math> <mml:math><mml:mi mathvariant="normal">%</mml:mi></mml:math><inline-graphic xlink:href="fu-ieq4-3441822.gif"/> </inline-formula>. The results also show that the performance enhancement is related to the coefficient of variation of the neural network to be computed.]]></description><subject>Artificial neural networks</subject><subject>Biological neural networks</subject><subject>Computer architecture</subject><subject>Hardware</subject><subject>hardware mapping</subject><subject>Network-on-chip</subject><subject>Neural networks</subject><subject>Reinforcement learning</subject><subject>Task analysis</subject><issn>0018-9340</issn><issn>1557-9956</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNpNkEFPwkAQhTdGExE9e_Gwf6Awu22X3SM2giYEElPPzXQ7hVXaJdui0V8vCAdP7-XNvHf4GLsXMBICzDjPRhJkMoqTRGgpL9hApOkkMiZVl2wAIHRk4gSu2U3XvQOAkmAG7HO6732DvbN8Ti2Fg_Mtx7biq13vGvdzCmYBG_ry4YP7mi99Fj1iRxVf0j7g9iD9321qLW2PGz7wfBP8fr3hr-Ta2gdLDbU9XxCG1rXrW3ZV47aju7MO2dvsKc-eo8Vq_pJNF5EVieqjCU1SVSttsCRlJBDUZLQUBhDSKq0NJiUiaFtKo8qKlEZpCaRGTNBYGQ_Z-LRrg--6QHWxC67B8F0IKI7YijwrjtiKM7ZD4-HUcET071vFsTYm_gXY9Gul</recordid><startdate>202412</startdate><enddate>202412</enddate><creator>Xue, Yongqi</creator><creator>Ji, Jinlun</creator><creator>Yu, Xinming</creator><creator>Zhou, Shize</creator><creator>Li, Siyue</creator><creator>Li, Xinyi</creator><creator>Cheng, Tong</creator><creator>Li, Shiping</creator><creator>Chen, Kai</creator><creator>Lu, Zhonghai</creator><creator>Li, Li</creator><creator>Fu, Yuxiang</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0003-1351-5460</orcidid><orcidid>https://orcid.org/0009-0007-2099-040X</orcidid><orcidid>https://orcid.org/0009-0007-5630-1514</orcidid><orcidid>https://orcid.org/0009-0009-5260-5726</orcidid><orcidid>https://orcid.org/0009-0003-2877-2387</orcidid><orcidid>https://orcid.org/0000-0002-1047-6067</orcidid><orcidid>https://orcid.org/0009-0007-9970-804X</orcidid><orcidid>https://orcid.org/0009-0004-4649-041X</orcidid><orcidid>https://orcid.org/0000-0001-8241-7536</orcidid><orcidid>https://orcid.org/0009-0003-0152-9571</orcidid><orcidid>https://orcid.org/0000-0003-0061-3475</orcidid><orcidid>https://orcid.org/0009-0007-9023-0465</orcidid></search><sort><creationdate>202412</creationdate><title>Automatic Generation and Optimization Framework of NoC-Based Neural Network Accelerator Through Reinforcement Learning</title><author>Xue, Yongqi ; Ji, Jinlun ; Yu, Xinming ; Zhou, Shize ; Li, Siyue ; Li, Xinyi ; Cheng, Tong ; Li, Shiping ; Chen, Kai ; Lu, Zhonghai ; Li, Li ; Fu, Yuxiang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c146t-7e756f689abe6920e0fe982190a05d5f9a4baa08cb296bde68a2ce028aa4a9c23</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Artificial neural networks</topic><topic>Biological neural networks</topic><topic>Computer architecture</topic><topic>Hardware</topic><topic>hardware mapping</topic><topic>Network-on-chip</topic><topic>Neural networks</topic><topic>Reinforcement learning</topic><topic>Task analysis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Xue, Yongqi</creatorcontrib><creatorcontrib>Ji, Jinlun</creatorcontrib><creatorcontrib>Yu, Xinming</creatorcontrib><creatorcontrib>Zhou, Shize</creatorcontrib><creatorcontrib>Li, Siyue</creatorcontrib><creatorcontrib>Li, Xinyi</creatorcontrib><creatorcontrib>Cheng, Tong</creatorcontrib><creatorcontrib>Li, Shiping</creatorcontrib><creatorcontrib>Chen, Kai</creatorcontrib><creatorcontrib>Lu, Zhonghai</creatorcontrib><creatorcontrib>Li, Li</creatorcontrib><creatorcontrib>Fu, Yuxiang</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005–Present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Xplore</collection><collection>CrossRef</collection><jtitle>IEEE transactions on computers</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Xue, Yongqi</au><au>Ji, Jinlun</au><au>Yu, Xinming</au><au>Zhou, Shize</au><au>Li, Siyue</au><au>Li, Xinyi</au><au>Cheng, Tong</au><au>Li, Shiping</au><au>Chen, Kai</au><au>Lu, Zhonghai</au><au>Li, Li</au><au>Fu, Yuxiang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Automatic Generation and Optimization Framework of NoC-Based Neural Network Accelerator Through Reinforcement Learning</atitle><jtitle>IEEE transactions on computers</jtitle><stitle>TC</stitle><date>2024-12</date><risdate>2024</risdate><volume>73</volume><issue>12</issue><spage>2882</spage><epage>2896</epage><pages>2882-2896</pages><issn>0018-9340</issn><eissn>1557-9956</eissn><coden>ITCOB4</coden><abstract><![CDATA[Choices of dataflows, which are known as intra-core neural network (NN) computation loop nest scheduling and inter-core hardware mapping strategies, play a critical role in the performance and energy efficiency of NoC-based neural network accelerators. Confronted with an enormous dataflow exploration space, this paper proposes an automatic framework for generating and optimizing the full-layer-mappings based on two reinforcement learning algorithms including A2C and PPO. Combining soft and hard constraints, this work transforms the mapping configuration into a sequential decision problem and aims to explore the performance and energy efficient hardware mapping for NoC systems. We evaluate the performance of the proposed framework on 10 experimental neural networks. The results show that compared with the direct-X mapping, the direct-Y mapping, GA-base mapping, and NN-aware mapping, our optimization framework reduces the average execution time of 10 experimental NNs by 9.09<inline-formula><tex-math notation="LaTeX">\%</tex-math> <mml:math><mml:mi mathvariant="normal">%</mml:mi></mml:math><inline-graphic xlink:href="fu-ieq1-3441822.gif"/> </inline-formula>, improves the throughput by 11.27<inline-formula><tex-math notation="LaTeX">\%</tex-math> <mml:math><mml:mi mathvariant="normal">%</mml:mi></mml:math><inline-graphic xlink:href="fu-ieq2-3441822.gif"/> </inline-formula>, reduces the energy by 12.62<inline-formula><tex-math notation="LaTeX">\%</tex-math> <mml:math><mml:mi mathvariant="normal">%</mml:mi></mml:math><inline-graphic xlink:href="fu-ieq3-3441822.gif"/> </inline-formula>, and reduces the time-energy-product (TEP) by 14.49<inline-formula><tex-math notation="LaTeX">\%</tex-math> <mml:math><mml:mi mathvariant="normal">%</mml:mi></mml:math><inline-graphic xlink:href="fu-ieq4-3441822.gif"/> </inline-formula>. The results also show that the performance enhancement is related to the coefficient of variation of the neural network to be computed.]]></abstract><pub>IEEE</pub><doi>10.1109/TC.2024.3441822</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0003-1351-5460</orcidid><orcidid>https://orcid.org/0009-0007-2099-040X</orcidid><orcidid>https://orcid.org/0009-0007-5630-1514</orcidid><orcidid>https://orcid.org/0009-0009-5260-5726</orcidid><orcidid>https://orcid.org/0009-0003-2877-2387</orcidid><orcidid>https://orcid.org/0000-0002-1047-6067</orcidid><orcidid>https://orcid.org/0009-0007-9970-804X</orcidid><orcidid>https://orcid.org/0009-0004-4649-041X</orcidid><orcidid>https://orcid.org/0000-0001-8241-7536</orcidid><orcidid>https://orcid.org/0009-0003-0152-9571</orcidid><orcidid>https://orcid.org/0000-0003-0061-3475</orcidid><orcidid>https://orcid.org/0009-0007-9023-0465</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0018-9340 |
ispartof | IEEE transactions on computers, 2024-12, Vol.73 (12), p.2882-2896 |
issn | 0018-9340 1557-9956 |
language | eng |
recordid | cdi_ieee_primary_10633899 |
source | IEEE Electronic Library (IEL) Journals |
subjects | Artificial neural networks Biological neural networks Computer architecture Hardware hardware mapping Network-on-chip Neural networks Reinforcement learning Task analysis |
title | Automatic Generation and Optimization Framework of NoC-Based Neural Network Accelerator Through Reinforcement Learning |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T14%3A44%3A11IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Automatic%20Generation%20and%20Optimization%20Framework%20of%20NoC-Based%20Neural%20Network%20Accelerator%20Through%20Reinforcement%20Learning&rft.jtitle=IEEE%20transactions%20on%20computers&rft.au=Xue,%20Yongqi&rft.date=2024-12&rft.volume=73&rft.issue=12&rft.spage=2882&rft.epage=2896&rft.pages=2882-2896&rft.issn=0018-9340&rft.eissn=1557-9956&rft.coden=ITCOB4&rft_id=info:doi/10.1109/TC.2024.3441822&rft_dat=%3Ccrossref_ieee_%3E10_1109_TC_2024_3441822%3C/crossref_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c146t-7e756f689abe6920e0fe982190a05d5f9a4baa08cb296bde68a2ce028aa4a9c23%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10633899&rfr_iscdi=true |