Loading…

Portable mapping of data parallel programs to OpenCL for heterogeneous systems

General purpose GPU based systems are highly attractive as they give potentially massive performance at little cost. Re-alizing such potential is challenging due to the complexity of programming. This paper presents a compiler based approach to automatically generate optimized OpenCL code from data-...

Full description

Saved in:
Bibliographic Details
Main Authors: O'Boyle, Michael F. P., Wang, Zheng, Grewe, Dominik
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 10
container_issue
container_start_page 1
container_title
container_volume
creator O'Boyle, Michael F. P.
Wang, Zheng
Grewe, Dominik
description General purpose GPU based systems are highly attractive as they give potentially massive performance at little cost. Re-alizing such potential is challenging due to the complexity of programming. This paper presents a compiler based approach to automatically generate optimized OpenCL code from data-parallel OpenMP programs for GPUs. Such an approach brings together the benefits of a clear high levellanguage (OpenMP) and an emerging standard (OpenCL) for heterogeneous multi-cores. A key feature of our scheme is that it leverages existing transformations, especially data transformations, to improve performance on GPU architectures and uses predictive modeling to automatically determine if it is worthwhile running the OpenCL code on the GPU or OpenMP code on the multi-core host. We applied our approach to the entire NAS parallel benchmark suite and evaluated it on two distinct GPU based systems: Core i7/NVIDIA GeForce GTX 580 and Core 17/AMD Radeon 7970. We achieved average (up to) speedups of 4.51x and 4.20x (143x and 67x) respectively over a sequential baseline. This is, on average, a factor 1.63 and 1.56 times faster than a hand-coded, GPU-specific OpenCL implementation developed by independent expert programmers.
doi_str_mv 10.1109/CGO.2013.6494993
format conference_proceeding
fullrecord <record><control><sourceid>acm_6IE</sourceid><recordid>TN_cdi_ieee_primary_6494993</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6494993</ieee_id><sourcerecordid>acm_books_10_1109_CGO_2013_6494993</sourcerecordid><originalsourceid>FETCH-LOGICAL-a164t-e2a9572dd7858f87803b68f85d6d9686a8d8face4587054657ed454860f04adf3</originalsourceid><addsrcrecordid>eNqNkD1PwzAQho0QEqh0R2LxytDWjr9HFEFBqigDzNalvpRAUkd2GPrvSdUysDHdnZ733uEh5IazOefMLcrlel4wLuZaOumcOCNTZyyX2gilCiXP_9zSXJJpzp-MMc6E1E5dkZfXmAaoWqQd9H2z29JY0wAD0B4StC22tE9xm6DLdIh03eOuXNE6JvqBA44Edxi_M837PGCXr8lFDW3G6WlOyPvjw1v5NFutl8_l_WoGXMthhgU4ZYoQjFW2tsYyUelxUUEHp60GG2wNG5TKGqakVgaDVNJqVjMJoRYTcnvsbRDR96npIO39ycJIF0cKm85XMX5lz5k_GPOjMX8w9pv1VWrw0Hf33w_xA2Apaek</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Portable mapping of data parallel programs to OpenCL for heterogeneous systems</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>O'Boyle, Michael F. P. ; Wang, Zheng ; Grewe, Dominik</creator><creatorcontrib>O'Boyle, Michael F. P. ; Wang, Zheng ; Grewe, Dominik</creatorcontrib><description>General purpose GPU based systems are highly attractive as they give potentially massive performance at little cost. Re-alizing such potential is challenging due to the complexity of programming. This paper presents a compiler based approach to automatically generate optimized OpenCL code from data-parallel OpenMP programs for GPUs. Such an approach brings together the benefits of a clear high levellanguage (OpenMP) and an emerging standard (OpenCL) for heterogeneous multi-cores. A key feature of our scheme is that it leverages existing transformations, especially data transformations, to improve performance on GPU architectures and uses predictive modeling to automatically determine if it is worthwhile running the OpenCL code on the GPU or OpenMP code on the multi-core host. We applied our approach to the entire NAS parallel benchmark suite and evaluated it on two distinct GPU based systems: Core i7/NVIDIA GeForce GTX 580 and Core 17/AMD Radeon 7970. We achieved average (up to) speedups of 4.51x and 4.20x (143x and 67x) respectively over a sequential baseline. This is, on average, a factor 1.63 and 1.56 times faster than a hand-coded, GPU-specific OpenCL implementation developed by independent expert programmers.</description><identifier>ISBN: 9781467355247</identifier><identifier>ISBN: 1467355240</identifier><identifier>EISBN: 9781467355254</identifier><identifier>EISBN: 1467355259</identifier><identifier>DOI: 10.1109/CGO.2013.6494993</identifier><language>eng</language><publisher>Washington, DC, USA: IEEE Computer Society</publisher><subject>Arrays ; Benchmark testing ; Feature extraction ; GPU ; Graphics processing units ; Indexes ; Kernel ; Learning Mapping ; Machine ; OpenCL ; Predictive models ; Software and its engineering -- Software notations and tools -- Compilers ; Software and its engineering -- Software notations and tools -- General programming languages -- Language features ; Theory of computation -- Models of computation -- Concurrency ; Theory of computation -- Models of computation -- Concurrency -- Parallel computing models</subject><ispartof>Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), 2013, p.1-10</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6494993$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2058,27925,54920</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6494993$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>O'Boyle, Michael F. P.</creatorcontrib><creatorcontrib>Wang, Zheng</creatorcontrib><creatorcontrib>Grewe, Dominik</creatorcontrib><title>Portable mapping of data parallel programs to OpenCL for heterogeneous systems</title><title>Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)</title><addtitle>CGO</addtitle><description>General purpose GPU based systems are highly attractive as they give potentially massive performance at little cost. Re-alizing such potential is challenging due to the complexity of programming. This paper presents a compiler based approach to automatically generate optimized OpenCL code from data-parallel OpenMP programs for GPUs. Such an approach brings together the benefits of a clear high levellanguage (OpenMP) and an emerging standard (OpenCL) for heterogeneous multi-cores. A key feature of our scheme is that it leverages existing transformations, especially data transformations, to improve performance on GPU architectures and uses predictive modeling to automatically determine if it is worthwhile running the OpenCL code on the GPU or OpenMP code on the multi-core host. We applied our approach to the entire NAS parallel benchmark suite and evaluated it on two distinct GPU based systems: Core i7/NVIDIA GeForce GTX 580 and Core 17/AMD Radeon 7970. We achieved average (up to) speedups of 4.51x and 4.20x (143x and 67x) respectively over a sequential baseline. This is, on average, a factor 1.63 and 1.56 times faster than a hand-coded, GPU-specific OpenCL implementation developed by independent expert programmers.</description><subject>Arrays</subject><subject>Benchmark testing</subject><subject>Feature extraction</subject><subject>GPU</subject><subject>Graphics processing units</subject><subject>Indexes</subject><subject>Kernel</subject><subject>Learning Mapping</subject><subject>Machine</subject><subject>OpenCL</subject><subject>Predictive models</subject><subject>Software and its engineering -- Software notations and tools -- Compilers</subject><subject>Software and its engineering -- Software notations and tools -- General programming languages -- Language features</subject><subject>Theory of computation -- Models of computation -- Concurrency</subject><subject>Theory of computation -- Models of computation -- Concurrency -- Parallel computing models</subject><isbn>9781467355247</isbn><isbn>1467355240</isbn><isbn>9781467355254</isbn><isbn>1467355259</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2013</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNqNkD1PwzAQho0QEqh0R2LxytDWjr9HFEFBqigDzNalvpRAUkd2GPrvSdUysDHdnZ733uEh5IazOefMLcrlel4wLuZaOumcOCNTZyyX2gilCiXP_9zSXJJpzp-MMc6E1E5dkZfXmAaoWqQd9H2z29JY0wAD0B4StC22tE9xm6DLdIh03eOuXNE6JvqBA44Edxi_M837PGCXr8lFDW3G6WlOyPvjw1v5NFutl8_l_WoGXMthhgU4ZYoQjFW2tsYyUelxUUEHp60GG2wNG5TKGqakVgaDVNJqVjMJoRYTcnvsbRDR96npIO39ycJIF0cKm85XMX5lz5k_GPOjMX8w9pv1VWrw0Hf33w_xA2Apaek</recordid><startdate>20130223</startdate><enddate>20130223</enddate><creator>O'Boyle, Michael F. P.</creator><creator>Wang, Zheng</creator><creator>Grewe, Dominik</creator><general>IEEE Computer Society</general><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>20130223</creationdate><title>Portable mapping of data parallel programs to OpenCL for heterogeneous systems</title><author>O'Boyle, Michael F. P. ; Wang, Zheng ; Grewe, Dominik</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a164t-e2a9572dd7858f87803b68f85d6d9686a8d8face4587054657ed454860f04adf3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Arrays</topic><topic>Benchmark testing</topic><topic>Feature extraction</topic><topic>GPU</topic><topic>Graphics processing units</topic><topic>Indexes</topic><topic>Kernel</topic><topic>Learning Mapping</topic><topic>Machine</topic><topic>OpenCL</topic><topic>Predictive models</topic><topic>Software and its engineering -- Software notations and tools -- Compilers</topic><topic>Software and its engineering -- Software notations and tools -- General programming languages -- Language features</topic><topic>Theory of computation -- Models of computation -- Concurrency</topic><topic>Theory of computation -- Models of computation -- Concurrency -- Parallel computing models</topic><toplevel>online_resources</toplevel><creatorcontrib>O'Boyle, Michael F. P.</creatorcontrib><creatorcontrib>Wang, Zheng</creatorcontrib><creatorcontrib>Grewe, Dominik</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>O'Boyle, Michael F. P.</au><au>Wang, Zheng</au><au>Grewe, Dominik</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Portable mapping of data parallel programs to OpenCL for heterogeneous systems</atitle><btitle>Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)</btitle><stitle>CGO</stitle><date>2013-02-23</date><risdate>2013</risdate><spage>1</spage><epage>10</epage><pages>1-10</pages><isbn>9781467355247</isbn><isbn>1467355240</isbn><eisbn>9781467355254</eisbn><eisbn>1467355259</eisbn><abstract>General purpose GPU based systems are highly attractive as they give potentially massive performance at little cost. Re-alizing such potential is challenging due to the complexity of programming. This paper presents a compiler based approach to automatically generate optimized OpenCL code from data-parallel OpenMP programs for GPUs. Such an approach brings together the benefits of a clear high levellanguage (OpenMP) and an emerging standard (OpenCL) for heterogeneous multi-cores. A key feature of our scheme is that it leverages existing transformations, especially data transformations, to improve performance on GPU architectures and uses predictive modeling to automatically determine if it is worthwhile running the OpenCL code on the GPU or OpenMP code on the multi-core host. We applied our approach to the entire NAS parallel benchmark suite and evaluated it on two distinct GPU based systems: Core i7/NVIDIA GeForce GTX 580 and Core 17/AMD Radeon 7970. We achieved average (up to) speedups of 4.51x and 4.20x (143x and 67x) respectively over a sequential baseline. This is, on average, a factor 1.63 and 1.56 times faster than a hand-coded, GPU-specific OpenCL implementation developed by independent expert programmers.</abstract><cop>Washington, DC, USA</cop><pub>IEEE Computer Society</pub><doi>10.1109/CGO.2013.6494993</doi><tpages>10</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISBN: 9781467355247
ispartof Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), 2013, p.1-10
issn
language eng
recordid cdi_ieee_primary_6494993
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Arrays
Benchmark testing
Feature extraction
GPU
Graphics processing units
Indexes
Kernel
Learning Mapping
Machine
OpenCL
Predictive models
Software and its engineering -- Software notations and tools -- Compilers
Software and its engineering -- Software notations and tools -- General programming languages -- Language features
Theory of computation -- Models of computation -- Concurrency
Theory of computation -- Models of computation -- Concurrency -- Parallel computing models
title Portable mapping of data parallel programs to OpenCL for heterogeneous systems
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T19%3A04%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Portable%20mapping%20of%20data%20parallel%20programs%20to%20OpenCL%20for%20heterogeneous%20systems&rft.btitle=Proceedings%20of%20the%202013%20IEEE/ACM%20International%20Symposium%20on%20Code%20Generation%20and%20Optimization%20(CGO)&rft.au=O'Boyle,%20Michael%20F.%20P.&rft.date=2013-02-23&rft.spage=1&rft.epage=10&rft.pages=1-10&rft.isbn=9781467355247&rft.isbn_list=1467355240&rft_id=info:doi/10.1109/CGO.2013.6494993&rft.eisbn=9781467355254&rft.eisbn_list=1467355259&rft_dat=%3Cacm_6IE%3Eacm_books_10_1109_CGO_2013_6494993%3C/acm_6IE%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-a164t-e2a9572dd7858f87803b68f85d6d9686a8d8face4587054657ed454860f04adf3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6494993&rfr_iscdi=true