Loading…
An OpenCL software compilation framework targeting an SoC-FPGA VLIW chip multiprocessor
Modern systems-on-chip augment their baseline CPU with coprocessors and accelerators to increase overall computational capability and power efficiency, and thus have evolved into heterogeneous multi-core systems. Several languages have been developed to enable this paradigm shift, including CUDA and...
Saved in:
Published in: | Journal of systems architecture 2016-08, Vol.68, p.17-37 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c413t-130fe366f970ec20ef68f5dcd9f36a1c0a6dc0f31a310f8538bbb61702003ae93 |
---|---|
cites | cdi_FETCH-LOGICAL-c413t-130fe366f970ec20ef68f5dcd9f36a1c0a6dc0f31a310f8538bbb61702003ae93 |
container_end_page | 37 |
container_issue | |
container_start_page | 17 |
container_title | Journal of systems architecture |
container_volume | 68 |
creator | Parker, Samuel J. Chouliaras, Vassilios A. |
description | Modern systems-on-chip augment their baseline CPU with coprocessors and accelerators to increase overall computational capability and power efficiency, and thus have evolved into heterogeneous multi-core systems. Several languages have been developed to enable this paradigm shift, including CUDA and OpenCL. This paper discusses a unified compilation environment to enable heterogeneous system design through the use of OpenCL and a highly configurable VLIW Chip Multiprocessor architecture known as the LE1. An LLVM compilation framework was researched and a prototype developed to enable the execution of OpenCL applications on a number of hardware configurations of the LE1 CMP. The presented OpenCL framework fully automates the compilation flow and supports work-item coalescing which better maps onto the ILP processor cores of the LE1 architecture. This paper discusses in detail both the software stack and target hardware architecture and evaluates the scalability of the proposed framework by running 12 industry-standard OpenCL benchmarks drawn from the AMD SDK and the Rodinia suites. The benchmarks are executed on 40 LE1 configurations with 10 implemented on an SoC-FPGA and the remaining on a cycle-accurate simulator. Across 12 OpenCL benchmarks results demonstrate near-linear wall-clock performance improvement of 1.8 × (using 2 dual-issue cores), up to 5.2 × (using 8 dual-issue cores) and on one case, super-linear improvement of 8.4 × (FixOffset kernel, 8 dual-issue cores). The number of OpenCL benchmarks evaluated makes this study one of the most complete in the literature. |
doi_str_mv | 10.1016/j.sysarc.2016.06.003 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1835661906</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S1383762116300613</els_id><sourcerecordid>1835661906</sourcerecordid><originalsourceid>FETCH-LOGICAL-c413t-130fe366f970ec20ef68f5dcd9f36a1c0a6dc0f31a310f8538bbb61702003ae93</originalsourceid><addsrcrecordid>eNp9UE1LAzEQDaKgVv-Bh4AXL1tnNt10exFKsSoUFPzoMaTZiabubtZkq_jvTaknDzIPZgbem4_H2BnCEAHl5XoYv6MOZpinbggJIPbYEZZjkUmUxX6qRSmysczxkB3HuAaAosD8iC2nLb_vqJ0tePS2_9KBuPFN52rdO99yG3RDXz68816HV-pd-8p1yx_9LJs_3Ez5y-Juyc2b63izqXvXBW8oRh9O2IHVdaTT3zxgz_Prp9lttri_uZtNF5kZoegzFGBJSGknYyCTA1lZ2qIy1cQKqdGAlpUBK1ALBFsWolytVhLHkKcXNU3EgF3s5qbNHxuKvWpcNFTXuiW_iQpLUUiJE5CJev6Huvab0KbrEgtBjESRYsBGO5YJPsZAVnXBNTp8KwS1dVut1c5ttXVbQQKIJLvaySg9--koqGgctYYqF8j0qvLu_wE_PeuJcw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1810343535</pqid></control><display><type>article</type><title>An OpenCL software compilation framework targeting an SoC-FPGA VLIW chip multiprocessor</title><source>ScienceDirect Freedom Collection</source><creator>Parker, Samuel J. ; Chouliaras, Vassilios A.</creator><creatorcontrib>Parker, Samuel J. ; Chouliaras, Vassilios A.</creatorcontrib><description>Modern systems-on-chip augment their baseline CPU with coprocessors and accelerators to increase overall computational capability and power efficiency, and thus have evolved into heterogeneous multi-core systems. Several languages have been developed to enable this paradigm shift, including CUDA and OpenCL. This paper discusses a unified compilation environment to enable heterogeneous system design through the use of OpenCL and a highly configurable VLIW Chip Multiprocessor architecture known as the LE1. An LLVM compilation framework was researched and a prototype developed to enable the execution of OpenCL applications on a number of hardware configurations of the LE1 CMP. The presented OpenCL framework fully automates the compilation flow and supports work-item coalescing which better maps onto the ILP processor cores of the LE1 architecture. This paper discusses in detail both the software stack and target hardware architecture and evaluates the scalability of the proposed framework by running 12 industry-standard OpenCL benchmarks drawn from the AMD SDK and the Rodinia suites. The benchmarks are executed on 40 LE1 configurations with 10 implemented on an SoC-FPGA and the remaining on a cycle-accurate simulator. Across 12 OpenCL benchmarks results demonstrate near-linear wall-clock performance improvement of 1.8 × (using 2 dual-issue cores), up to 5.2 × (using 8 dual-issue cores) and on one case, super-linear improvement of 8.4 × (FixOffset kernel, 8 dual-issue cores). The number of OpenCL benchmarks evaluated makes this study one of the most complete in the literature.</description><identifier>ISSN: 1383-7621</identifier><identifier>EISSN: 1873-6165</identifier><identifier>DOI: 10.1016/j.sysarc.2016.06.003</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Architecture ; Architecture (computers) ; Benchmarks ; Compilation ; Computer architecture ; Computer programs ; Computer simulation ; FPGA ; Hardware ; Heterogeneous computing ; Multi-core ; Multiprocessor ; OpenCL ; Scalability ; Software ; Studies ; Systems design</subject><ispartof>Journal of systems architecture, 2016-08, Vol.68, p.17-37</ispartof><rights>2016 Elsevier B.V.</rights><rights>Copyright Elsevier Sequoia S.A. Aug 2016</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c413t-130fe366f970ec20ef68f5dcd9f36a1c0a6dc0f31a310f8538bbb61702003ae93</citedby><cites>FETCH-LOGICAL-c413t-130fe366f970ec20ef68f5dcd9f36a1c0a6dc0f31a310f8538bbb61702003ae93</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Parker, Samuel J.</creatorcontrib><creatorcontrib>Chouliaras, Vassilios A.</creatorcontrib><title>An OpenCL software compilation framework targeting an SoC-FPGA VLIW chip multiprocessor</title><title>Journal of systems architecture</title><description>Modern systems-on-chip augment their baseline CPU with coprocessors and accelerators to increase overall computational capability and power efficiency, and thus have evolved into heterogeneous multi-core systems. Several languages have been developed to enable this paradigm shift, including CUDA and OpenCL. This paper discusses a unified compilation environment to enable heterogeneous system design through the use of OpenCL and a highly configurable VLIW Chip Multiprocessor architecture known as the LE1. An LLVM compilation framework was researched and a prototype developed to enable the execution of OpenCL applications on a number of hardware configurations of the LE1 CMP. The presented OpenCL framework fully automates the compilation flow and supports work-item coalescing which better maps onto the ILP processor cores of the LE1 architecture. This paper discusses in detail both the software stack and target hardware architecture and evaluates the scalability of the proposed framework by running 12 industry-standard OpenCL benchmarks drawn from the AMD SDK and the Rodinia suites. The benchmarks are executed on 40 LE1 configurations with 10 implemented on an SoC-FPGA and the remaining on a cycle-accurate simulator. Across 12 OpenCL benchmarks results demonstrate near-linear wall-clock performance improvement of 1.8 × (using 2 dual-issue cores), up to 5.2 × (using 8 dual-issue cores) and on one case, super-linear improvement of 8.4 × (FixOffset kernel, 8 dual-issue cores). The number of OpenCL benchmarks evaluated makes this study one of the most complete in the literature.</description><subject>Architecture</subject><subject>Architecture (computers)</subject><subject>Benchmarks</subject><subject>Compilation</subject><subject>Computer architecture</subject><subject>Computer programs</subject><subject>Computer simulation</subject><subject>FPGA</subject><subject>Hardware</subject><subject>Heterogeneous computing</subject><subject>Multi-core</subject><subject>Multiprocessor</subject><subject>OpenCL</subject><subject>Scalability</subject><subject>Software</subject><subject>Studies</subject><subject>Systems design</subject><issn>1383-7621</issn><issn>1873-6165</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><recordid>eNp9UE1LAzEQDaKgVv-Bh4AXL1tnNt10exFKsSoUFPzoMaTZiabubtZkq_jvTaknDzIPZgbem4_H2BnCEAHl5XoYv6MOZpinbggJIPbYEZZjkUmUxX6qRSmysczxkB3HuAaAosD8iC2nLb_vqJ0tePS2_9KBuPFN52rdO99yG3RDXz68816HV-pd-8p1yx_9LJs_3Ez5y-Juyc2b63izqXvXBW8oRh9O2IHVdaTT3zxgz_Prp9lttri_uZtNF5kZoegzFGBJSGknYyCTA1lZ2qIy1cQKqdGAlpUBK1ALBFsWolytVhLHkKcXNU3EgF3s5qbNHxuKvWpcNFTXuiW_iQpLUUiJE5CJev6Huvab0KbrEgtBjESRYsBGO5YJPsZAVnXBNTp8KwS1dVut1c5ttXVbQQKIJLvaySg9--koqGgctYYqF8j0qvLu_wE_PeuJcw</recordid><startdate>201608</startdate><enddate>201608</enddate><creator>Parker, Samuel J.</creator><creator>Chouliaras, Vassilios A.</creator><general>Elsevier B.V</general><general>Elsevier Sequoia S.A</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>201608</creationdate><title>An OpenCL software compilation framework targeting an SoC-FPGA VLIW chip multiprocessor</title><author>Parker, Samuel J. ; Chouliaras, Vassilios A.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c413t-130fe366f970ec20ef68f5dcd9f36a1c0a6dc0f31a310f8538bbb61702003ae93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Architecture</topic><topic>Architecture (computers)</topic><topic>Benchmarks</topic><topic>Compilation</topic><topic>Computer architecture</topic><topic>Computer programs</topic><topic>Computer simulation</topic><topic>FPGA</topic><topic>Hardware</topic><topic>Heterogeneous computing</topic><topic>Multi-core</topic><topic>Multiprocessor</topic><topic>OpenCL</topic><topic>Scalability</topic><topic>Software</topic><topic>Studies</topic><topic>Systems design</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Parker, Samuel J.</creatorcontrib><creatorcontrib>Chouliaras, Vassilios A.</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Journal of systems architecture</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Parker, Samuel J.</au><au>Chouliaras, Vassilios A.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An OpenCL software compilation framework targeting an SoC-FPGA VLIW chip multiprocessor</atitle><jtitle>Journal of systems architecture</jtitle><date>2016-08</date><risdate>2016</risdate><volume>68</volume><spage>17</spage><epage>37</epage><pages>17-37</pages><issn>1383-7621</issn><eissn>1873-6165</eissn><abstract>Modern systems-on-chip augment their baseline CPU with coprocessors and accelerators to increase overall computational capability and power efficiency, and thus have evolved into heterogeneous multi-core systems. Several languages have been developed to enable this paradigm shift, including CUDA and OpenCL. This paper discusses a unified compilation environment to enable heterogeneous system design through the use of OpenCL and a highly configurable VLIW Chip Multiprocessor architecture known as the LE1. An LLVM compilation framework was researched and a prototype developed to enable the execution of OpenCL applications on a number of hardware configurations of the LE1 CMP. The presented OpenCL framework fully automates the compilation flow and supports work-item coalescing which better maps onto the ILP processor cores of the LE1 architecture. This paper discusses in detail both the software stack and target hardware architecture and evaluates the scalability of the proposed framework by running 12 industry-standard OpenCL benchmarks drawn from the AMD SDK and the Rodinia suites. The benchmarks are executed on 40 LE1 configurations with 10 implemented on an SoC-FPGA and the remaining on a cycle-accurate simulator. Across 12 OpenCL benchmarks results demonstrate near-linear wall-clock performance improvement of 1.8 × (using 2 dual-issue cores), up to 5.2 × (using 8 dual-issue cores) and on one case, super-linear improvement of 8.4 × (FixOffset kernel, 8 dual-issue cores). The number of OpenCL benchmarks evaluated makes this study one of the most complete in the literature.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.sysarc.2016.06.003</doi><tpages>21</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1383-7621 |
ispartof | Journal of systems architecture, 2016-08, Vol.68, p.17-37 |
issn | 1383-7621 1873-6165 |
language | eng |
recordid | cdi_proquest_miscellaneous_1835661906 |
source | ScienceDirect Freedom Collection |
subjects | Architecture Architecture (computers) Benchmarks Compilation Computer architecture Computer programs Computer simulation FPGA Hardware Heterogeneous computing Multi-core Multiprocessor OpenCL Scalability Software Studies Systems design |
title | An OpenCL software compilation framework targeting an SoC-FPGA VLIW chip multiprocessor |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T02%3A28%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20OpenCL%20software%20compilation%20framework%20targeting%20an%20SoC-FPGA%20VLIW%20chip%20multiprocessor&rft.jtitle=Journal%20of%20systems%20architecture&rft.au=Parker,%20Samuel%20J.&rft.date=2016-08&rft.volume=68&rft.spage=17&rft.epage=37&rft.pages=17-37&rft.issn=1383-7621&rft.eissn=1873-6165&rft_id=info:doi/10.1016/j.sysarc.2016.06.003&rft_dat=%3Cproquest_cross%3E1835661906%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c413t-130fe366f970ec20ef68f5dcd9f36a1c0a6dc0f31a310f8538bbb61702003ae93%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1810343535&rft_id=info:pmid/&rfr_iscdi=true |