Loading…
Memory-Aware Loop Mapping on Coarse-Grained Reconfigurable Architectures
The coarse-grained reconfigurable architectures (CGRAs) are a promising class of architectures with the advantages of high performance and high power efficiency. The compute-intensive parts of an application (e.g., loops) are often mapped onto the CGRA for acceleration. Due to the extra overhead of...
Saved in:
Published in: | IEEE transactions on very large scale integration (VLSI) systems 2016-05, Vol.24 (5), p.1895-1908 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c361t-87f4d5a5268c6e59b6d3ee9990efb00de33bdaf24f44847a820fa6ebdad11b693 |
---|---|
cites | cdi_FETCH-LOGICAL-c361t-87f4d5a5268c6e59b6d3ee9990efb00de33bdaf24f44847a820fa6ebdad11b693 |
container_end_page | 1908 |
container_issue | 5 |
container_start_page | 1895 |
container_title | IEEE transactions on very large scale integration (VLSI) systems |
container_volume | 24 |
creator | Yin, Shouyi Yao, Xianqing Liu, Dajiang Liu, Leibo Wei, Shaojun |
description | The coarse-grained reconfigurable architectures (CGRAs) are a promising class of architectures with the advantages of high performance and high power efficiency. The compute-intensive parts of an application (e.g., loops) are often mapped onto the CGRA for acceleration. Due to the extra overhead of memory access and the limited communication bandwidth between the processing element (PE) array and local memory, previous works trying to solve the routing problem are mainly confined in the internal resources of PE arrays (e.g., PEs and registers). Inevitably, routing with PEs or registers will consume a lot of computational resources and cause the increase of the initiation interval. To solve this problem, this paper makes two contributions: 1) establishing a precise formulation for the CGRA mapping problem while using shared local data memory as a routing resource and 2) extracting an effective approach for mapping loops to CGRAs. The experimental results on loops of the SPEC2006, Livermore, and MiBench show that our approach (called MEMMap) can improve the performance of the kernels on CGRA up to 1.62×, 1.58×, 1.28×, and 1.23× compared with the edge-centric modulo scheduling, EPIMap, REGIMap, and force-directed map, respectively, with an acceptable increase in compilation time. |
doi_str_mv | 10.1109/TVLSI.2015.2474129 |
format | article |
fullrecord | <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_ieee_primary_7273971</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7273971</ieee_id><sourcerecordid>4047112671</sourcerecordid><originalsourceid>FETCH-LOGICAL-c361t-87f4d5a5268c6e59b6d3ee9990efb00de33bdaf24f44847a820fa6ebdad11b693</originalsourceid><addsrcrecordid>eNo9kNFLwzAQxoMoOKf_gL4UfM7MJWnTPo6h26BD0OlrSNvL7NiamrTI_ns7N7yXO47vu-_4EXIPbALAsqf1Z_6-nHAG8YRLJYFnF2QEcaxoNtTlMLNE0JQDuyY3IWwZAykzNiKLFe6dP9Dpj_EY5c610cq0bd1sItdEM2d8QDr3pm6wit6wdI2tN703xQ6jqS-_6g7LrvcYbsmVNbuAd-c-Jh8vz-vZguav8-VsmtNSJNDRVFlZxSbmSVomGGdFUgnE4UeGtmCsQiGKylgurZSpVCblzJoEh10FUCSZGJPH093Wu-8eQ6e3rvfNEKlBpQpSLoUYVPykKr0LwaPVra_3xh80MH0kpv-I6SMxfSY2mB5OphoR_w2KK5EpEL-SimgA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1787182433</pqid></control><display><type>article</type><title>Memory-Aware Loop Mapping on Coarse-Grained Reconfigurable Architectures</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Yin, Shouyi ; Yao, Xianqing ; Liu, Dajiang ; Liu, Leibo ; Wei, Shaojun</creator><creatorcontrib>Yin, Shouyi ; Yao, Xianqing ; Liu, Dajiang ; Liu, Leibo ; Wei, Shaojun</creatorcontrib><description>The coarse-grained reconfigurable architectures (CGRAs) are a promising class of architectures with the advantages of high performance and high power efficiency. The compute-intensive parts of an application (e.g., loops) are often mapped onto the CGRA for acceleration. Due to the extra overhead of memory access and the limited communication bandwidth between the processing element (PE) array and local memory, previous works trying to solve the routing problem are mainly confined in the internal resources of PE arrays (e.g., PEs and registers). Inevitably, routing with PEs or registers will consume a lot of computational resources and cause the increase of the initiation interval. To solve this problem, this paper makes two contributions: 1) establishing a precise formulation for the CGRA mapping problem while using shared local data memory as a routing resource and 2) extracting an effective approach for mapping loops to CGRAs. The experimental results on loops of the SPEC2006, Livermore, and MiBench show that our approach (called MEMMap) can improve the performance of the kernels on CGRA up to 1.62×, 1.58×, 1.28×, and 1.23× compared with the edge-centric modulo scheduling, EPIMap, REGIMap, and force-directed map, respectively, with an acceptable increase in compilation time.</description><identifier>ISSN: 1063-8210</identifier><identifier>EISSN: 1557-9999</identifier><identifier>DOI: 10.1109/TVLSI.2015.2474129</identifier><identifier>CODEN: IEVSE9</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Arrays ; Cascading style sheets ; Coarse-grained reconfigurable architectures (CGRAs) ; Kernel ; loop pipelining ; memory-aware mapping ; modulo scheduling ; Registers ; Routing ; Topology</subject><ispartof>IEEE transactions on very large scale integration (VLSI) systems, 2016-05, Vol.24 (5), p.1895-1908</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2016</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c361t-87f4d5a5268c6e59b6d3ee9990efb00de33bdaf24f44847a820fa6ebdad11b693</citedby><cites>FETCH-LOGICAL-c361t-87f4d5a5268c6e59b6d3ee9990efb00de33bdaf24f44847a820fa6ebdad11b693</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7273971$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,54796</link.rule.ids></links><search><creatorcontrib>Yin, Shouyi</creatorcontrib><creatorcontrib>Yao, Xianqing</creatorcontrib><creatorcontrib>Liu, Dajiang</creatorcontrib><creatorcontrib>Liu, Leibo</creatorcontrib><creatorcontrib>Wei, Shaojun</creatorcontrib><title>Memory-Aware Loop Mapping on Coarse-Grained Reconfigurable Architectures</title><title>IEEE transactions on very large scale integration (VLSI) systems</title><addtitle>TVLSI</addtitle><description>The coarse-grained reconfigurable architectures (CGRAs) are a promising class of architectures with the advantages of high performance and high power efficiency. The compute-intensive parts of an application (e.g., loops) are often mapped onto the CGRA for acceleration. Due to the extra overhead of memory access and the limited communication bandwidth between the processing element (PE) array and local memory, previous works trying to solve the routing problem are mainly confined in the internal resources of PE arrays (e.g., PEs and registers). Inevitably, routing with PEs or registers will consume a lot of computational resources and cause the increase of the initiation interval. To solve this problem, this paper makes two contributions: 1) establishing a precise formulation for the CGRA mapping problem while using shared local data memory as a routing resource and 2) extracting an effective approach for mapping loops to CGRAs. The experimental results on loops of the SPEC2006, Livermore, and MiBench show that our approach (called MEMMap) can improve the performance of the kernels on CGRA up to 1.62×, 1.58×, 1.28×, and 1.23× compared with the edge-centric modulo scheduling, EPIMap, REGIMap, and force-directed map, respectively, with an acceptable increase in compilation time.</description><subject>Arrays</subject><subject>Cascading style sheets</subject><subject>Coarse-grained reconfigurable architectures (CGRAs)</subject><subject>Kernel</subject><subject>loop pipelining</subject><subject>memory-aware mapping</subject><subject>modulo scheduling</subject><subject>Registers</subject><subject>Routing</subject><subject>Topology</subject><issn>1063-8210</issn><issn>1557-9999</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><recordid>eNo9kNFLwzAQxoMoOKf_gL4UfM7MJWnTPo6h26BD0OlrSNvL7NiamrTI_ns7N7yXO47vu-_4EXIPbALAsqf1Z_6-nHAG8YRLJYFnF2QEcaxoNtTlMLNE0JQDuyY3IWwZAykzNiKLFe6dP9Dpj_EY5c610cq0bd1sItdEM2d8QDr3pm6wit6wdI2tN703xQ6jqS-_6g7LrvcYbsmVNbuAd-c-Jh8vz-vZguav8-VsmtNSJNDRVFlZxSbmSVomGGdFUgnE4UeGtmCsQiGKylgurZSpVCblzJoEh10FUCSZGJPH093Wu-8eQ6e3rvfNEKlBpQpSLoUYVPykKr0LwaPVra_3xh80MH0kpv-I6SMxfSY2mB5OphoR_w2KK5EpEL-SimgA</recordid><startdate>201605</startdate><enddate>201605</enddate><creator>Yin, Shouyi</creator><creator>Yao, Xianqing</creator><creator>Liu, Dajiang</creator><creator>Liu, Leibo</creator><creator>Wei, Shaojun</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>8FD</scope><scope>L7M</scope></search><sort><creationdate>201605</creationdate><title>Memory-Aware Loop Mapping on Coarse-Grained Reconfigurable Architectures</title><author>Yin, Shouyi ; Yao, Xianqing ; Liu, Dajiang ; Liu, Leibo ; Wei, Shaojun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c361t-87f4d5a5268c6e59b6d3ee9990efb00de33bdaf24f44847a820fa6ebdad11b693</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Arrays</topic><topic>Cascading style sheets</topic><topic>Coarse-grained reconfigurable architectures (CGRAs)</topic><topic>Kernel</topic><topic>loop pipelining</topic><topic>memory-aware mapping</topic><topic>modulo scheduling</topic><topic>Registers</topic><topic>Routing</topic><topic>Topology</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yin, Shouyi</creatorcontrib><creatorcontrib>Yao, Xianqing</creatorcontrib><creatorcontrib>Liu, Dajiang</creatorcontrib><creatorcontrib>Liu, Leibo</creatorcontrib><creatorcontrib>Wei, Shaojun</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) Online</collection><collection>IEL</collection><collection>CrossRef</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection><jtitle>IEEE transactions on very large scale integration (VLSI) systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yin, Shouyi</au><au>Yao, Xianqing</au><au>Liu, Dajiang</au><au>Liu, Leibo</au><au>Wei, Shaojun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Memory-Aware Loop Mapping on Coarse-Grained Reconfigurable Architectures</atitle><jtitle>IEEE transactions on very large scale integration (VLSI) systems</jtitle><stitle>TVLSI</stitle><date>2016-05</date><risdate>2016</risdate><volume>24</volume><issue>5</issue><spage>1895</spage><epage>1908</epage><pages>1895-1908</pages><issn>1063-8210</issn><eissn>1557-9999</eissn><coden>IEVSE9</coden><abstract>The coarse-grained reconfigurable architectures (CGRAs) are a promising class of architectures with the advantages of high performance and high power efficiency. The compute-intensive parts of an application (e.g., loops) are often mapped onto the CGRA for acceleration. Due to the extra overhead of memory access and the limited communication bandwidth between the processing element (PE) array and local memory, previous works trying to solve the routing problem are mainly confined in the internal resources of PE arrays (e.g., PEs and registers). Inevitably, routing with PEs or registers will consume a lot of computational resources and cause the increase of the initiation interval. To solve this problem, this paper makes two contributions: 1) establishing a precise formulation for the CGRA mapping problem while using shared local data memory as a routing resource and 2) extracting an effective approach for mapping loops to CGRAs. The experimental results on loops of the SPEC2006, Livermore, and MiBench show that our approach (called MEMMap) can improve the performance of the kernels on CGRA up to 1.62×, 1.58×, 1.28×, and 1.23× compared with the edge-centric modulo scheduling, EPIMap, REGIMap, and force-directed map, respectively, with an acceptable increase in compilation time.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TVLSI.2015.2474129</doi><tpages>14</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1063-8210 |
ispartof | IEEE transactions on very large scale integration (VLSI) systems, 2016-05, Vol.24 (5), p.1895-1908 |
issn | 1063-8210 1557-9999 |
language | eng |
recordid | cdi_ieee_primary_7273971 |
source | IEEE Electronic Library (IEL) Journals |
subjects | Arrays Cascading style sheets Coarse-grained reconfigurable architectures (CGRAs) Kernel loop pipelining memory-aware mapping modulo scheduling Registers Routing Topology |
title | Memory-Aware Loop Mapping on Coarse-Grained Reconfigurable Architectures |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T20%3A34%3A09IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Memory-Aware%20Loop%20Mapping%20on%20Coarse-Grained%20Reconfigurable%20Architectures&rft.jtitle=IEEE%20transactions%20on%20very%20large%20scale%20integration%20(VLSI)%20systems&rft.au=Yin,%20Shouyi&rft.date=2016-05&rft.volume=24&rft.issue=5&rft.spage=1895&rft.epage=1908&rft.pages=1895-1908&rft.issn=1063-8210&rft.eissn=1557-9999&rft.coden=IEVSE9&rft_id=info:doi/10.1109/TVLSI.2015.2474129&rft_dat=%3Cproquest_ieee_%3E4047112671%3C/proquest_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c361t-87f4d5a5268c6e59b6d3ee9990efb00de33bdaf24f44847a820fa6ebdad11b693%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1787182433&rft_id=info:pmid/&rft_ieee_id=7273971&rfr_iscdi=true |