Loading…
Software-Pipelining on Multi-Core Architectures
It is becoming increasingly evident that multi-core chip architecture are emerging as a solution to efficiently amortizing the ever-growing number of transistors on a chip. However the success of such multi-core chips depends on the advances in system software technology, such as compiler and run-ti...
Saved in:
Main Authors: | , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | 48 |
container_issue | |
container_start_page | 39 |
container_title | |
container_volume | |
creator | Douillet, A. Gao, G.R. |
description | It is becoming increasingly evident that multi-core chip architecture are emerging as a solution to efficiently amortizing the ever-growing number of transistors on a chip. However the success of such multi-core chips depends on the advances in system software technology, such as compiler and run-time system, in order for the application programs to exploit thread level parallelism out of originally single-threaded applications and to fully utilize the hardware on-chip concurrency. In this paper, we propose a method which, from a parallel and non-parallel imperfect loop nest written in a standard sequential language such as C or Fortran, automatically generates a multi-threaded software-pipelined schedule for multi-core architectures. The generated schedule already contains all the necessary synchronization instructions and is guaranteed free of deadlocks and buffer overflow. The feasibility of the proposed method within a modern compiler infrastructure has been verified through a pilot implementation in the Open64 compiler and tested on the IBM Cyclops multi-core architecture. Experimental results show that the performance exhibits good scalability even with 100 cores. Our light-weight synchronization mechanism minimizes the dependencies stalls and synchronization overheads without the use of dedicated hardware support. |
doi_str_mv | 10.1109/PACT.2007.4336198 |
format | conference_proceeding |
fullrecord | <record><control><sourceid>proquest_6IE</sourceid><recordid>TN_cdi_proquest_miscellaneous_31554241</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>4336198</ieee_id><sourcerecordid>31554241</sourcerecordid><originalsourceid>FETCH-LOGICAL-i511-a4adcb7d6679543f2283598021cbb964db4106782e316cb84690f9fb1fe30cc23</originalsourceid><addsrcrecordid>eNotUMtKxEAQHHyA67ofIF5y8pbs9LySOYbgC1ZcMAdvYTLp6Eg2iTMJ4t8b2O1DVUEVVdCE3AJNAKje7vOiTBilaSI4V6CzM7JiSkCcaiHOyTVNlZZs0fKCrIBmejHkxxXZhPBNl-NaLbgi2_ehnX6Nx3jvRuxc7_rPaOij17mbXFwMHqPc2y83oZ1mj-GGXLamC7g58ZqUjw9l8Rzv3p5einwXOwkQG2EaW6eNUsuq4C1jGZc6owxsXWslmloAVWnGkIOydSaUpq1ua2iRU2sZX5P7Y-3oh58Zw1QdXLDYdabHYQ4VBykFE7AE745Bh4jV6N3B-L_q9BP-DybhUbc</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype><pqid>31554241</pqid></control><display><type>conference_proceeding</type><title>Software-Pipelining on Multi-Core Architectures</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Douillet, A. ; Gao, G.R.</creator><creatorcontrib>Douillet, A. ; Gao, G.R.</creatorcontrib><description>It is becoming increasingly evident that multi-core chip architecture are emerging as a solution to efficiently amortizing the ever-growing number of transistors on a chip. However the success of such multi-core chips depends on the advances in system software technology, such as compiler and run-time system, in order for the application programs to exploit thread level parallelism out of originally single-threaded applications and to fully utilize the hardware on-chip concurrency. In this paper, we propose a method which, from a parallel and non-parallel imperfect loop nest written in a standard sequential language such as C or Fortran, automatically generates a multi-threaded software-pipelined schedule for multi-core architectures. The generated schedule already contains all the necessary synchronization instructions and is guaranteed free of deadlocks and buffer overflow. The feasibility of the proposed method within a modern compiler infrastructure has been verified through a pilot implementation in the Open64 compiler and tested on the IBM Cyclops multi-core architecture. Experimental results show that the performance exhibits good scalability even with 100 cores. Our light-weight synchronization mechanism minimizes the dependencies stalls and synchronization overheads without the use of dedicated hardware support.</description><identifier>ISSN: 1089-795X</identifier><identifier>ISBN: 0769529445</identifier><identifier>ISBN: 9780769529448</identifier><identifier>EISSN: 2641-7944</identifier><identifier>DOI: 10.1109/PACT.2007.4336198</identifier><language>eng</language><publisher>IEEE</publisher><subject>Application software ; Computer architecture ; Concurrent computing ; Hardware ; Program processors ; Software standards ; System software ; System-on-a-chip ; Transistors ; Yarn</subject><ispartof>16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007), 2007, p.39-48</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/4336198$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2058,27925,54920</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/4336198$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Douillet, A.</creatorcontrib><creatorcontrib>Gao, G.R.</creatorcontrib><title>Software-Pipelining on Multi-Core Architectures</title><title>16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007)</title><addtitle>PACT</addtitle><description>It is becoming increasingly evident that multi-core chip architecture are emerging as a solution to efficiently amortizing the ever-growing number of transistors on a chip. However the success of such multi-core chips depends on the advances in system software technology, such as compiler and run-time system, in order for the application programs to exploit thread level parallelism out of originally single-threaded applications and to fully utilize the hardware on-chip concurrency. In this paper, we propose a method which, from a parallel and non-parallel imperfect loop nest written in a standard sequential language such as C or Fortran, automatically generates a multi-threaded software-pipelined schedule for multi-core architectures. The generated schedule already contains all the necessary synchronization instructions and is guaranteed free of deadlocks and buffer overflow. The feasibility of the proposed method within a modern compiler infrastructure has been verified through a pilot implementation in the Open64 compiler and tested on the IBM Cyclops multi-core architecture. Experimental results show that the performance exhibits good scalability even with 100 cores. Our light-weight synchronization mechanism minimizes the dependencies stalls and synchronization overheads without the use of dedicated hardware support.</description><subject>Application software</subject><subject>Computer architecture</subject><subject>Concurrent computing</subject><subject>Hardware</subject><subject>Program processors</subject><subject>Software standards</subject><subject>System software</subject><subject>System-on-a-chip</subject><subject>Transistors</subject><subject>Yarn</subject><issn>1089-795X</issn><issn>2641-7944</issn><isbn>0769529445</isbn><isbn>9780769529448</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2007</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotUMtKxEAQHHyA67ofIF5y8pbs9LySOYbgC1ZcMAdvYTLp6Eg2iTMJ4t8b2O1DVUEVVdCE3AJNAKje7vOiTBilaSI4V6CzM7JiSkCcaiHOyTVNlZZs0fKCrIBmejHkxxXZhPBNl-NaLbgi2_ehnX6Nx3jvRuxc7_rPaOij17mbXFwMHqPc2y83oZ1mj-GGXLamC7g58ZqUjw9l8Rzv3p5einwXOwkQG2EaW6eNUsuq4C1jGZc6owxsXWslmloAVWnGkIOydSaUpq1ua2iRU2sZX5P7Y-3oh58Zw1QdXLDYdabHYQ4VBykFE7AE745Bh4jV6N3B-L_q9BP-DybhUbc</recordid><startdate>200709</startdate><enddate>200709</enddate><creator>Douillet, A.</creator><creator>Gao, G.R.</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>200709</creationdate><title>Software-Pipelining on Multi-Core Architectures</title><author>Douillet, A. ; Gao, G.R.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i511-a4adcb7d6679543f2283598021cbb964db4106782e316cb84690f9fb1fe30cc23</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2007</creationdate><topic>Application software</topic><topic>Computer architecture</topic><topic>Concurrent computing</topic><topic>Hardware</topic><topic>Program processors</topic><topic>Software standards</topic><topic>System software</topic><topic>System-on-a-chip</topic><topic>Transistors</topic><topic>Yarn</topic><toplevel>online_resources</toplevel><creatorcontrib>Douillet, A.</creatorcontrib><creatorcontrib>Gao, G.R.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE/IET Electronic Library</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Douillet, A.</au><au>Gao, G.R.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Software-Pipelining on Multi-Core Architectures</atitle><btitle>16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007)</btitle><stitle>PACT</stitle><date>2007-09</date><risdate>2007</risdate><spage>39</spage><epage>48</epage><pages>39-48</pages><issn>1089-795X</issn><eissn>2641-7944</eissn><isbn>0769529445</isbn><isbn>9780769529448</isbn><abstract>It is becoming increasingly evident that multi-core chip architecture are emerging as a solution to efficiently amortizing the ever-growing number of transistors on a chip. However the success of such multi-core chips depends on the advances in system software technology, such as compiler and run-time system, in order for the application programs to exploit thread level parallelism out of originally single-threaded applications and to fully utilize the hardware on-chip concurrency. In this paper, we propose a method which, from a parallel and non-parallel imperfect loop nest written in a standard sequential language such as C or Fortran, automatically generates a multi-threaded software-pipelined schedule for multi-core architectures. The generated schedule already contains all the necessary synchronization instructions and is guaranteed free of deadlocks and buffer overflow. The feasibility of the proposed method within a modern compiler infrastructure has been verified through a pilot implementation in the Open64 compiler and tested on the IBM Cyclops multi-core architecture. Experimental results show that the performance exhibits good scalability even with 100 cores. Our light-weight synchronization mechanism minimizes the dependencies stalls and synchronization overheads without the use of dedicated hardware support.</abstract><pub>IEEE</pub><doi>10.1109/PACT.2007.4336198</doi><tpages>10</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1089-795X |
ispartof | 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007), 2007, p.39-48 |
issn | 1089-795X 2641-7944 |
language | eng |
recordid | cdi_proquest_miscellaneous_31554241 |
source | IEEE Electronic Library (IEL) Conference Proceedings |
subjects | Application software Computer architecture Concurrent computing Hardware Program processors Software standards System software System-on-a-chip Transistors Yarn |
title | Software-Pipelining on Multi-Core Architectures |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T05%3A13%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Software-Pipelining%20on%20Multi-Core%20Architectures&rft.btitle=16th%20International%20Conference%20on%20Parallel%20Architecture%20and%20Compilation%20Techniques%20(PACT%202007)&rft.au=Douillet,%20A.&rft.date=2007-09&rft.spage=39&rft.epage=48&rft.pages=39-48&rft.issn=1089-795X&rft.eissn=2641-7944&rft.isbn=0769529445&rft.isbn_list=9780769529448&rft_id=info:doi/10.1109/PACT.2007.4336198&rft_dat=%3Cproquest_6IE%3E31554241%3C/proquest_6IE%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i511-a4adcb7d6679543f2283598021cbb964db4106782e316cb84690f9fb1fe30cc23%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=31554241&rft_id=info:pmid/&rft_ieee_id=4336198&rfr_iscdi=true |