Loading…

An OpenCL software compilation framework targeting an SoC-FPGA VLIW chip multiprocessor

Modern systems-on-chip augment their baseline CPU with coprocessors and accelerators to increase overall computational capability and power efficiency, and thus have evolved into heterogeneous multi-core systems. Several languages have been developed to enable this paradigm shift, including CUDA and...

Full description

Saved in:
Bibliographic Details
Published in:Journal of systems architecture 2016-08, Vol.68, p.17-37
Main Authors: Parker, Samuel J., Chouliaras, Vassilios A.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c413t-130fe366f970ec20ef68f5dcd9f36a1c0a6dc0f31a310f8538bbb61702003ae93
cites cdi_FETCH-LOGICAL-c413t-130fe366f970ec20ef68f5dcd9f36a1c0a6dc0f31a310f8538bbb61702003ae93
container_end_page 37
container_issue
container_start_page 17
container_title Journal of systems architecture
container_volume 68
creator Parker, Samuel J.
Chouliaras, Vassilios A.
description Modern systems-on-chip augment their baseline CPU with coprocessors and accelerators to increase overall computational capability and power efficiency, and thus have evolved into heterogeneous multi-core systems. Several languages have been developed to enable this paradigm shift, including CUDA and OpenCL. This paper discusses a unified compilation environment to enable heterogeneous system design through the use of OpenCL and a highly configurable VLIW Chip Multiprocessor architecture known as the LE1. An LLVM compilation framework was researched and a prototype developed to enable the execution of OpenCL applications on a number of hardware configurations of the LE1 CMP. The presented OpenCL framework fully automates the compilation flow and supports work-item coalescing which better maps onto the ILP processor cores of the LE1 architecture. This paper discusses in detail both the software stack and target hardware architecture and evaluates the scalability of the proposed framework by running 12 industry-standard OpenCL benchmarks drawn from the AMD SDK and the Rodinia suites. The benchmarks are executed on 40 LE1 configurations with 10 implemented on an SoC-FPGA and the remaining on a cycle-accurate simulator. Across 12 OpenCL benchmarks results demonstrate near-linear wall-clock performance improvement of 1.8 × (using 2 dual-issue cores), up to 5.2 × (using 8 dual-issue cores) and on one case, super-linear improvement of 8.4 × (FixOffset kernel, 8 dual-issue cores). The number of OpenCL benchmarks evaluated makes this study one of the most complete in the literature.
doi_str_mv 10.1016/j.sysarc.2016.06.003
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1835661906</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S1383762116300613</els_id><sourcerecordid>1835661906</sourcerecordid><originalsourceid>FETCH-LOGICAL-c413t-130fe366f970ec20ef68f5dcd9f36a1c0a6dc0f31a310f8538bbb61702003ae93</originalsourceid><addsrcrecordid>eNp9UE1LAzEQDaKgVv-Bh4AXL1tnNt10exFKsSoUFPzoMaTZiabubtZkq_jvTaknDzIPZgbem4_H2BnCEAHl5XoYv6MOZpinbggJIPbYEZZjkUmUxX6qRSmysczxkB3HuAaAosD8iC2nLb_vqJ0tePS2_9KBuPFN52rdO99yG3RDXz68816HV-pd-8p1yx_9LJs_3Ez5y-Juyc2b63izqXvXBW8oRh9O2IHVdaTT3zxgz_Prp9lttri_uZtNF5kZoegzFGBJSGknYyCTA1lZ2qIy1cQKqdGAlpUBK1ALBFsWolytVhLHkKcXNU3EgF3s5qbNHxuKvWpcNFTXuiW_iQpLUUiJE5CJev6Huvab0KbrEgtBjESRYsBGO5YJPsZAVnXBNTp8KwS1dVut1c5ttXVbQQKIJLvaySg9--koqGgctYYqF8j0qvLu_wE_PeuJcw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1810343535</pqid></control><display><type>article</type><title>An OpenCL software compilation framework targeting an SoC-FPGA VLIW chip multiprocessor</title><source>ScienceDirect Freedom Collection</source><creator>Parker, Samuel J. ; Chouliaras, Vassilios A.</creator><creatorcontrib>Parker, Samuel J. ; Chouliaras, Vassilios A.</creatorcontrib><description>Modern systems-on-chip augment their baseline CPU with coprocessors and accelerators to increase overall computational capability and power efficiency, and thus have evolved into heterogeneous multi-core systems. Several languages have been developed to enable this paradigm shift, including CUDA and OpenCL. This paper discusses a unified compilation environment to enable heterogeneous system design through the use of OpenCL and a highly configurable VLIW Chip Multiprocessor architecture known as the LE1. An LLVM compilation framework was researched and a prototype developed to enable the execution of OpenCL applications on a number of hardware configurations of the LE1 CMP. The presented OpenCL framework fully automates the compilation flow and supports work-item coalescing which better maps onto the ILP processor cores of the LE1 architecture. This paper discusses in detail both the software stack and target hardware architecture and evaluates the scalability of the proposed framework by running 12 industry-standard OpenCL benchmarks drawn from the AMD SDK and the Rodinia suites. The benchmarks are executed on 40 LE1 configurations with 10 implemented on an SoC-FPGA and the remaining on a cycle-accurate simulator. Across 12 OpenCL benchmarks results demonstrate near-linear wall-clock performance improvement of 1.8 × (using 2 dual-issue cores), up to 5.2 × (using 8 dual-issue cores) and on one case, super-linear improvement of 8.4 × (FixOffset kernel, 8 dual-issue cores). The number of OpenCL benchmarks evaluated makes this study one of the most complete in the literature.</description><identifier>ISSN: 1383-7621</identifier><identifier>EISSN: 1873-6165</identifier><identifier>DOI: 10.1016/j.sysarc.2016.06.003</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Architecture ; Architecture (computers) ; Benchmarks ; Compilation ; Computer architecture ; Computer programs ; Computer simulation ; FPGA ; Hardware ; Heterogeneous computing ; Multi-core ; Multiprocessor ; OpenCL ; Scalability ; Software ; Studies ; Systems design</subject><ispartof>Journal of systems architecture, 2016-08, Vol.68, p.17-37</ispartof><rights>2016 Elsevier B.V.</rights><rights>Copyright Elsevier Sequoia S.A. Aug 2016</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c413t-130fe366f970ec20ef68f5dcd9f36a1c0a6dc0f31a310f8538bbb61702003ae93</citedby><cites>FETCH-LOGICAL-c413t-130fe366f970ec20ef68f5dcd9f36a1c0a6dc0f31a310f8538bbb61702003ae93</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Parker, Samuel J.</creatorcontrib><creatorcontrib>Chouliaras, Vassilios A.</creatorcontrib><title>An OpenCL software compilation framework targeting an SoC-FPGA VLIW chip multiprocessor</title><title>Journal of systems architecture</title><description>Modern systems-on-chip augment their baseline CPU with coprocessors and accelerators to increase overall computational capability and power efficiency, and thus have evolved into heterogeneous multi-core systems. Several languages have been developed to enable this paradigm shift, including CUDA and OpenCL. This paper discusses a unified compilation environment to enable heterogeneous system design through the use of OpenCL and a highly configurable VLIW Chip Multiprocessor architecture known as the LE1. An LLVM compilation framework was researched and a prototype developed to enable the execution of OpenCL applications on a number of hardware configurations of the LE1 CMP. The presented OpenCL framework fully automates the compilation flow and supports work-item coalescing which better maps onto the ILP processor cores of the LE1 architecture. This paper discusses in detail both the software stack and target hardware architecture and evaluates the scalability of the proposed framework by running 12 industry-standard OpenCL benchmarks drawn from the AMD SDK and the Rodinia suites. The benchmarks are executed on 40 LE1 configurations with 10 implemented on an SoC-FPGA and the remaining on a cycle-accurate simulator. Across 12 OpenCL benchmarks results demonstrate near-linear wall-clock performance improvement of 1.8 × (using 2 dual-issue cores), up to 5.2 × (using 8 dual-issue cores) and on one case, super-linear improvement of 8.4 × (FixOffset kernel, 8 dual-issue cores). The number of OpenCL benchmarks evaluated makes this study one of the most complete in the literature.</description><subject>Architecture</subject><subject>Architecture (computers)</subject><subject>Benchmarks</subject><subject>Compilation</subject><subject>Computer architecture</subject><subject>Computer programs</subject><subject>Computer simulation</subject><subject>FPGA</subject><subject>Hardware</subject><subject>Heterogeneous computing</subject><subject>Multi-core</subject><subject>Multiprocessor</subject><subject>OpenCL</subject><subject>Scalability</subject><subject>Software</subject><subject>Studies</subject><subject>Systems design</subject><issn>1383-7621</issn><issn>1873-6165</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><recordid>eNp9UE1LAzEQDaKgVv-Bh4AXL1tnNt10exFKsSoUFPzoMaTZiabubtZkq_jvTaknDzIPZgbem4_H2BnCEAHl5XoYv6MOZpinbggJIPbYEZZjkUmUxX6qRSmysczxkB3HuAaAosD8iC2nLb_vqJ0tePS2_9KBuPFN52rdO99yG3RDXz68816HV-pd-8p1yx_9LJs_3Ez5y-Juyc2b63izqXvXBW8oRh9O2IHVdaTT3zxgz_Prp9lttri_uZtNF5kZoegzFGBJSGknYyCTA1lZ2qIy1cQKqdGAlpUBK1ALBFsWolytVhLHkKcXNU3EgF3s5qbNHxuKvWpcNFTXuiW_iQpLUUiJE5CJev6Huvab0KbrEgtBjESRYsBGO5YJPsZAVnXBNTp8KwS1dVut1c5ttXVbQQKIJLvaySg9--koqGgctYYqF8j0qvLu_wE_PeuJcw</recordid><startdate>201608</startdate><enddate>201608</enddate><creator>Parker, Samuel J.</creator><creator>Chouliaras, Vassilios A.</creator><general>Elsevier B.V</general><general>Elsevier Sequoia S.A</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>201608</creationdate><title>An OpenCL software compilation framework targeting an SoC-FPGA VLIW chip multiprocessor</title><author>Parker, Samuel J. ; Chouliaras, Vassilios A.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c413t-130fe366f970ec20ef68f5dcd9f36a1c0a6dc0f31a310f8538bbb61702003ae93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Architecture</topic><topic>Architecture (computers)</topic><topic>Benchmarks</topic><topic>Compilation</topic><topic>Computer architecture</topic><topic>Computer programs</topic><topic>Computer simulation</topic><topic>FPGA</topic><topic>Hardware</topic><topic>Heterogeneous computing</topic><topic>Multi-core</topic><topic>Multiprocessor</topic><topic>OpenCL</topic><topic>Scalability</topic><topic>Software</topic><topic>Studies</topic><topic>Systems design</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Parker, Samuel J.</creatorcontrib><creatorcontrib>Chouliaras, Vassilios A.</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Journal of systems architecture</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Parker, Samuel J.</au><au>Chouliaras, Vassilios A.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An OpenCL software compilation framework targeting an SoC-FPGA VLIW chip multiprocessor</atitle><jtitle>Journal of systems architecture</jtitle><date>2016-08</date><risdate>2016</risdate><volume>68</volume><spage>17</spage><epage>37</epage><pages>17-37</pages><issn>1383-7621</issn><eissn>1873-6165</eissn><abstract>Modern systems-on-chip augment their baseline CPU with coprocessors and accelerators to increase overall computational capability and power efficiency, and thus have evolved into heterogeneous multi-core systems. Several languages have been developed to enable this paradigm shift, including CUDA and OpenCL. This paper discusses a unified compilation environment to enable heterogeneous system design through the use of OpenCL and a highly configurable VLIW Chip Multiprocessor architecture known as the LE1. An LLVM compilation framework was researched and a prototype developed to enable the execution of OpenCL applications on a number of hardware configurations of the LE1 CMP. The presented OpenCL framework fully automates the compilation flow and supports work-item coalescing which better maps onto the ILP processor cores of the LE1 architecture. This paper discusses in detail both the software stack and target hardware architecture and evaluates the scalability of the proposed framework by running 12 industry-standard OpenCL benchmarks drawn from the AMD SDK and the Rodinia suites. The benchmarks are executed on 40 LE1 configurations with 10 implemented on an SoC-FPGA and the remaining on a cycle-accurate simulator. Across 12 OpenCL benchmarks results demonstrate near-linear wall-clock performance improvement of 1.8 × (using 2 dual-issue cores), up to 5.2 × (using 8 dual-issue cores) and on one case, super-linear improvement of 8.4 × (FixOffset kernel, 8 dual-issue cores). The number of OpenCL benchmarks evaluated makes this study one of the most complete in the literature.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.sysarc.2016.06.003</doi><tpages>21</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1383-7621
ispartof Journal of systems architecture, 2016-08, Vol.68, p.17-37
issn 1383-7621
1873-6165
language eng
recordid cdi_proquest_miscellaneous_1835661906
source ScienceDirect Freedom Collection
subjects Architecture
Architecture (computers)
Benchmarks
Compilation
Computer architecture
Computer programs
Computer simulation
FPGA
Hardware
Heterogeneous computing
Multi-core
Multiprocessor
OpenCL
Scalability
Software
Studies
Systems design
title An OpenCL software compilation framework targeting an SoC-FPGA VLIW chip multiprocessor
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T02%3A28%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20OpenCL%20software%20compilation%20framework%20targeting%20an%20SoC-FPGA%20VLIW%20chip%20multiprocessor&rft.jtitle=Journal%20of%20systems%20architecture&rft.au=Parker,%20Samuel%20J.&rft.date=2016-08&rft.volume=68&rft.spage=17&rft.epage=37&rft.pages=17-37&rft.issn=1383-7621&rft.eissn=1873-6165&rft_id=info:doi/10.1016/j.sysarc.2016.06.003&rft_dat=%3Cproquest_cross%3E1835661906%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c413t-130fe366f970ec20ef68f5dcd9f36a1c0a6dc0f31a310f8538bbb61702003ae93%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1810343535&rft_id=info:pmid/&rfr_iscdi=true