Loading…

Memory-Aware Loop Mapping on Coarse-Grained Reconfigurable Architectures

The coarse-grained reconfigurable architectures (CGRAs) are a promising class of architectures with the advantages of high performance and high power efficiency. The compute-intensive parts of an application (e.g., loops) are often mapped onto the CGRA for acceleration. Due to the extra overhead of...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on very large scale integration (VLSI) systems 2016-05, Vol.24 (5), p.1895-1908
Main Authors: Yin, Shouyi, Yao, Xianqing, Liu, Dajiang, Liu, Leibo, Wei, Shaojun
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c361t-87f4d5a5268c6e59b6d3ee9990efb00de33bdaf24f44847a820fa6ebdad11b693
cites cdi_FETCH-LOGICAL-c361t-87f4d5a5268c6e59b6d3ee9990efb00de33bdaf24f44847a820fa6ebdad11b693
container_end_page 1908
container_issue 5
container_start_page 1895
container_title IEEE transactions on very large scale integration (VLSI) systems
container_volume 24
creator Yin, Shouyi
Yao, Xianqing
Liu, Dajiang
Liu, Leibo
Wei, Shaojun
description The coarse-grained reconfigurable architectures (CGRAs) are a promising class of architectures with the advantages of high performance and high power efficiency. The compute-intensive parts of an application (e.g., loops) are often mapped onto the CGRA for acceleration. Due to the extra overhead of memory access and the limited communication bandwidth between the processing element (PE) array and local memory, previous works trying to solve the routing problem are mainly confined in the internal resources of PE arrays (e.g., PEs and registers). Inevitably, routing with PEs or registers will consume a lot of computational resources and cause the increase of the initiation interval. To solve this problem, this paper makes two contributions: 1) establishing a precise formulation for the CGRA mapping problem while using shared local data memory as a routing resource and 2) extracting an effective approach for mapping loops to CGRAs. The experimental results on loops of the SPEC2006, Livermore, and MiBench show that our approach (called MEMMap) can improve the performance of the kernels on CGRA up to 1.62×, 1.58×, 1.28×, and 1.23× compared with the edge-centric modulo scheduling, EPIMap, REGIMap, and force-directed map, respectively, with an acceptable increase in compilation time.
doi_str_mv 10.1109/TVLSI.2015.2474129
format article
fullrecord <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_ieee_primary_7273971</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7273971</ieee_id><sourcerecordid>4047112671</sourcerecordid><originalsourceid>FETCH-LOGICAL-c361t-87f4d5a5268c6e59b6d3ee9990efb00de33bdaf24f44847a820fa6ebdad11b693</originalsourceid><addsrcrecordid>eNo9kNFLwzAQxoMoOKf_gL4UfM7MJWnTPo6h26BD0OlrSNvL7NiamrTI_ns7N7yXO47vu-_4EXIPbALAsqf1Z_6-nHAG8YRLJYFnF2QEcaxoNtTlMLNE0JQDuyY3IWwZAykzNiKLFe6dP9Dpj_EY5c610cq0bd1sItdEM2d8QDr3pm6wit6wdI2tN703xQ6jqS-_6g7LrvcYbsmVNbuAd-c-Jh8vz-vZguav8-VsmtNSJNDRVFlZxSbmSVomGGdFUgnE4UeGtmCsQiGKylgurZSpVCblzJoEh10FUCSZGJPH093Wu-8eQ6e3rvfNEKlBpQpSLoUYVPykKr0LwaPVra_3xh80MH0kpv-I6SMxfSY2mB5OphoR_w2KK5EpEL-SimgA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1787182433</pqid></control><display><type>article</type><title>Memory-Aware Loop Mapping on Coarse-Grained Reconfigurable Architectures</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Yin, Shouyi ; Yao, Xianqing ; Liu, Dajiang ; Liu, Leibo ; Wei, Shaojun</creator><creatorcontrib>Yin, Shouyi ; Yao, Xianqing ; Liu, Dajiang ; Liu, Leibo ; Wei, Shaojun</creatorcontrib><description>The coarse-grained reconfigurable architectures (CGRAs) are a promising class of architectures with the advantages of high performance and high power efficiency. The compute-intensive parts of an application (e.g., loops) are often mapped onto the CGRA for acceleration. Due to the extra overhead of memory access and the limited communication bandwidth between the processing element (PE) array and local memory, previous works trying to solve the routing problem are mainly confined in the internal resources of PE arrays (e.g., PEs and registers). Inevitably, routing with PEs or registers will consume a lot of computational resources and cause the increase of the initiation interval. To solve this problem, this paper makes two contributions: 1) establishing a precise formulation for the CGRA mapping problem while using shared local data memory as a routing resource and 2) extracting an effective approach for mapping loops to CGRAs. The experimental results on loops of the SPEC2006, Livermore, and MiBench show that our approach (called MEMMap) can improve the performance of the kernels on CGRA up to 1.62×, 1.58×, 1.28×, and 1.23× compared with the edge-centric modulo scheduling, EPIMap, REGIMap, and force-directed map, respectively, with an acceptable increase in compilation time.</description><identifier>ISSN: 1063-8210</identifier><identifier>EISSN: 1557-9999</identifier><identifier>DOI: 10.1109/TVLSI.2015.2474129</identifier><identifier>CODEN: IEVSE9</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Arrays ; Cascading style sheets ; Coarse-grained reconfigurable architectures (CGRAs) ; Kernel ; loop pipelining ; memory-aware mapping ; modulo scheduling ; Registers ; Routing ; Topology</subject><ispartof>IEEE transactions on very large scale integration (VLSI) systems, 2016-05, Vol.24 (5), p.1895-1908</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2016</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c361t-87f4d5a5268c6e59b6d3ee9990efb00de33bdaf24f44847a820fa6ebdad11b693</citedby><cites>FETCH-LOGICAL-c361t-87f4d5a5268c6e59b6d3ee9990efb00de33bdaf24f44847a820fa6ebdad11b693</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7273971$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,54796</link.rule.ids></links><search><creatorcontrib>Yin, Shouyi</creatorcontrib><creatorcontrib>Yao, Xianqing</creatorcontrib><creatorcontrib>Liu, Dajiang</creatorcontrib><creatorcontrib>Liu, Leibo</creatorcontrib><creatorcontrib>Wei, Shaojun</creatorcontrib><title>Memory-Aware Loop Mapping on Coarse-Grained Reconfigurable Architectures</title><title>IEEE transactions on very large scale integration (VLSI) systems</title><addtitle>TVLSI</addtitle><description>The coarse-grained reconfigurable architectures (CGRAs) are a promising class of architectures with the advantages of high performance and high power efficiency. The compute-intensive parts of an application (e.g., loops) are often mapped onto the CGRA for acceleration. Due to the extra overhead of memory access and the limited communication bandwidth between the processing element (PE) array and local memory, previous works trying to solve the routing problem are mainly confined in the internal resources of PE arrays (e.g., PEs and registers). Inevitably, routing with PEs or registers will consume a lot of computational resources and cause the increase of the initiation interval. To solve this problem, this paper makes two contributions: 1) establishing a precise formulation for the CGRA mapping problem while using shared local data memory as a routing resource and 2) extracting an effective approach for mapping loops to CGRAs. The experimental results on loops of the SPEC2006, Livermore, and MiBench show that our approach (called MEMMap) can improve the performance of the kernels on CGRA up to 1.62×, 1.58×, 1.28×, and 1.23× compared with the edge-centric modulo scheduling, EPIMap, REGIMap, and force-directed map, respectively, with an acceptable increase in compilation time.</description><subject>Arrays</subject><subject>Cascading style sheets</subject><subject>Coarse-grained reconfigurable architectures (CGRAs)</subject><subject>Kernel</subject><subject>loop pipelining</subject><subject>memory-aware mapping</subject><subject>modulo scheduling</subject><subject>Registers</subject><subject>Routing</subject><subject>Topology</subject><issn>1063-8210</issn><issn>1557-9999</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><recordid>eNo9kNFLwzAQxoMoOKf_gL4UfM7MJWnTPo6h26BD0OlrSNvL7NiamrTI_ns7N7yXO47vu-_4EXIPbALAsqf1Z_6-nHAG8YRLJYFnF2QEcaxoNtTlMLNE0JQDuyY3IWwZAykzNiKLFe6dP9Dpj_EY5c610cq0bd1sItdEM2d8QDr3pm6wit6wdI2tN703xQ6jqS-_6g7LrvcYbsmVNbuAd-c-Jh8vz-vZguav8-VsmtNSJNDRVFlZxSbmSVomGGdFUgnE4UeGtmCsQiGKylgurZSpVCblzJoEh10FUCSZGJPH093Wu-8eQ6e3rvfNEKlBpQpSLoUYVPykKr0LwaPVra_3xh80MH0kpv-I6SMxfSY2mB5OphoR_w2KK5EpEL-SimgA</recordid><startdate>201605</startdate><enddate>201605</enddate><creator>Yin, Shouyi</creator><creator>Yao, Xianqing</creator><creator>Liu, Dajiang</creator><creator>Liu, Leibo</creator><creator>Wei, Shaojun</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>8FD</scope><scope>L7M</scope></search><sort><creationdate>201605</creationdate><title>Memory-Aware Loop Mapping on Coarse-Grained Reconfigurable Architectures</title><author>Yin, Shouyi ; Yao, Xianqing ; Liu, Dajiang ; Liu, Leibo ; Wei, Shaojun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c361t-87f4d5a5268c6e59b6d3ee9990efb00de33bdaf24f44847a820fa6ebdad11b693</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Arrays</topic><topic>Cascading style sheets</topic><topic>Coarse-grained reconfigurable architectures (CGRAs)</topic><topic>Kernel</topic><topic>loop pipelining</topic><topic>memory-aware mapping</topic><topic>modulo scheduling</topic><topic>Registers</topic><topic>Routing</topic><topic>Topology</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yin, Shouyi</creatorcontrib><creatorcontrib>Yao, Xianqing</creatorcontrib><creatorcontrib>Liu, Dajiang</creatorcontrib><creatorcontrib>Liu, Leibo</creatorcontrib><creatorcontrib>Wei, Shaojun</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) Online</collection><collection>IEL</collection><collection>CrossRef</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>Advanced Technologies Database with Aerospace</collection><jtitle>IEEE transactions on very large scale integration (VLSI) systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yin, Shouyi</au><au>Yao, Xianqing</au><au>Liu, Dajiang</au><au>Liu, Leibo</au><au>Wei, Shaojun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Memory-Aware Loop Mapping on Coarse-Grained Reconfigurable Architectures</atitle><jtitle>IEEE transactions on very large scale integration (VLSI) systems</jtitle><stitle>TVLSI</stitle><date>2016-05</date><risdate>2016</risdate><volume>24</volume><issue>5</issue><spage>1895</spage><epage>1908</epage><pages>1895-1908</pages><issn>1063-8210</issn><eissn>1557-9999</eissn><coden>IEVSE9</coden><abstract>The coarse-grained reconfigurable architectures (CGRAs) are a promising class of architectures with the advantages of high performance and high power efficiency. The compute-intensive parts of an application (e.g., loops) are often mapped onto the CGRA for acceleration. Due to the extra overhead of memory access and the limited communication bandwidth between the processing element (PE) array and local memory, previous works trying to solve the routing problem are mainly confined in the internal resources of PE arrays (e.g., PEs and registers). Inevitably, routing with PEs or registers will consume a lot of computational resources and cause the increase of the initiation interval. To solve this problem, this paper makes two contributions: 1) establishing a precise formulation for the CGRA mapping problem while using shared local data memory as a routing resource and 2) extracting an effective approach for mapping loops to CGRAs. The experimental results on loops of the SPEC2006, Livermore, and MiBench show that our approach (called MEMMap) can improve the performance of the kernels on CGRA up to 1.62×, 1.58×, 1.28×, and 1.23× compared with the edge-centric modulo scheduling, EPIMap, REGIMap, and force-directed map, respectively, with an acceptable increase in compilation time.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TVLSI.2015.2474129</doi><tpages>14</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1063-8210
ispartof IEEE transactions on very large scale integration (VLSI) systems, 2016-05, Vol.24 (5), p.1895-1908
issn 1063-8210
1557-9999
language eng
recordid cdi_ieee_primary_7273971
source IEEE Electronic Library (IEL) Journals
subjects Arrays
Cascading style sheets
Coarse-grained reconfigurable architectures (CGRAs)
Kernel
loop pipelining
memory-aware mapping
modulo scheduling
Registers
Routing
Topology
title Memory-Aware Loop Mapping on Coarse-Grained Reconfigurable Architectures
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T20%3A34%3A09IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Memory-Aware%20Loop%20Mapping%20on%20Coarse-Grained%20Reconfigurable%20Architectures&rft.jtitle=IEEE%20transactions%20on%20very%20large%20scale%20integration%20(VLSI)%20systems&rft.au=Yin,%20Shouyi&rft.date=2016-05&rft.volume=24&rft.issue=5&rft.spage=1895&rft.epage=1908&rft.pages=1895-1908&rft.issn=1063-8210&rft.eissn=1557-9999&rft.coden=IEVSE9&rft_id=info:doi/10.1109/TVLSI.2015.2474129&rft_dat=%3Cproquest_ieee_%3E4047112671%3C/proquest_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c361t-87f4d5a5268c6e59b6d3ee9990efb00de33bdaf24f44847a820fa6ebdad11b693%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1787182433&rft_id=info:pmid/&rft_ieee_id=7273971&rfr_iscdi=true