Loading…

FCUDA: Enabling efficient compilation of CUDA kernels onto FPGAs

As growing power dissipation and thermal effects disrupted the rising clock frequency trend and threatened to annul Moore's law, the computing industry has switched its route to higher performance through parallel processing. The rise of multicore systems in all domains of computing has opened...

Full description

Saved in:
Bibliographic Details
Main Authors: Papakonstantinou, A., Gururaj, K., Stratton, J.A., Chen, D., Cong, J., Hwu, W.-M.W.
Format: Conference Proceeding
Language:English
Subjects:
Citations: Items that cite this one
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c289t-9a8cf9919a30c857c63a412bef295a25440f711457f14ea5007e14ef7eb96fd43
cites
container_end_page 42
container_issue
container_start_page 35
container_title
container_volume
creator Papakonstantinou, A.
Gururaj, K.
Stratton, J.A.
Chen, D.
Cong, J.
Hwu, W.-M.W.
description As growing power dissipation and thermal effects disrupted the rising clock frequency trend and threatened to annul Moore's law, the computing industry has switched its route to higher performance through parallel processing. The rise of multicore systems in all domains of computing has opened the door to heterogeneous multiprocessor, where processors of different compute characteristics can be combined to effectively boost the performance per watt of different application kernels. GPUs and FPGAs are becoming very popular in PC-based heterogeneous systems for speeding up compute intensive kernels of scientific, imaging and simulation applications. GPUs can execute hundreds of concurrent threads, while FPGAs provide customized concurrency for highly parallel kernels. However, exploiting the parallelism available in these applications is currently not a push-button task. Often the programmer has to expose the application's fine and coarse grained parallelism by using special APIs. CUDA is such a parallel computing API that is driven by the GPU industry and is gaining significant popularity. In this work, we adapt the CUDA programming model into a new FPGA design flow called FCUDA, which efficiently maps the coarse and fine grained parallelism exposed in CUDA onto the reconfigurable fabric. Our CUDA-to-FPGA flow employs autopilot, an advanced high level synthesis tool which enables high abstraction FPGA programming. FCUDA is based on a source-to-source compilation that transforms the SPMD CUDA thread blocks into parallel C code for autopilot. We describe the details of our CUDA-to-FPGA flow and demonstrate the highly competitive performance of the resulting customized FPGA multi-core accelerators. To the best of our knowledge, this is the first CUDA-to-FPGA flow to demonstrate the applicability and potential advantage of using the CUDA programming model for high-performance computing in FPGAs.
doi_str_mv 10.1109/SASP.2009.5226333
format conference_proceeding
fullrecord <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_5226333</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5226333</ieee_id><sourcerecordid>5226333</sourcerecordid><originalsourceid>FETCH-LOGICAL-c289t-9a8cf9919a30c857c63a412bef295a25440f711457f14ea5007e14ef7eb96fd43</originalsourceid><addsrcrecordid>eNpFj91Kw0AUhFekoK19APFmXyDx7F-S45UhNlUoWKi9Lpv1rKymm5Lkxre3pQXnZmbgY2AYuxeQCgH4uCk361QCYGqkzJRSV2wqtNRaoyrU9X9BMWHTE4iQSRA3bD4M33CUNkoUcMue62r7Uj7xRbRNG-IXJ--DCxRH7rr9IbR2DF3knecnjv9QH6kdeBfHjtfrZTncsYm37UDzi8_Ytl58VK_J6n35VpWrxMkCxwRt4TyiQKvAFSZ3mbJayIa8RGOl0Rp8LoQ2uRearAHI6Rh8Tg1m_lOrGXs47wYi2h36sLf97-5yX_0BEIhJlg</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>FCUDA: Enabling efficient compilation of CUDA kernels onto FPGAs</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Papakonstantinou, A. ; Gururaj, K. ; Stratton, J.A. ; Chen, D. ; Cong, J. ; Hwu, W.-M.W.</creator><creatorcontrib>Papakonstantinou, A. ; Gururaj, K. ; Stratton, J.A. ; Chen, D. ; Cong, J. ; Hwu, W.-M.W.</creatorcontrib><description>As growing power dissipation and thermal effects disrupted the rising clock frequency trend and threatened to annul Moore's law, the computing industry has switched its route to higher performance through parallel processing. The rise of multicore systems in all domains of computing has opened the door to heterogeneous multiprocessor, where processors of different compute characteristics can be combined to effectively boost the performance per watt of different application kernels. GPUs and FPGAs are becoming very popular in PC-based heterogeneous systems for speeding up compute intensive kernels of scientific, imaging and simulation applications. GPUs can execute hundreds of concurrent threads, while FPGAs provide customized concurrency for highly parallel kernels. However, exploiting the parallelism available in these applications is currently not a push-button task. Often the programmer has to expose the application's fine and coarse grained parallelism by using special APIs. CUDA is such a parallel computing API that is driven by the GPU industry and is gaining significant popularity. In this work, we adapt the CUDA programming model into a new FPGA design flow called FCUDA, which efficiently maps the coarse and fine grained parallelism exposed in CUDA onto the reconfigurable fabric. Our CUDA-to-FPGA flow employs autopilot, an advanced high level synthesis tool which enables high abstraction FPGA programming. FCUDA is based on a source-to-source compilation that transforms the SPMD CUDA thread blocks into parallel C code for autopilot. We describe the details of our CUDA-to-FPGA flow and demonstrate the highly competitive performance of the resulting customized FPGA multi-core accelerators. To the best of our knowledge, this is the first CUDA-to-FPGA flow to demonstrate the applicability and potential advantage of using the CUDA programming model for high-performance computing in FPGAs.</description><identifier>ISBN: 1424449391</identifier><identifier>ISBN: 9781424449392</identifier><identifier>EISBN: 1424449383</identifier><identifier>EISBN: 9781424449385</identifier><identifier>DOI: 10.1109/SASP.2009.5226333</identifier><identifier>LCCN: 2009906201</identifier><language>eng</language><publisher>IEEE</publisher><subject>Clocks ; Computer industry ; Concurrent computing ; Field programmable gate arrays ; Frequency ; Kernel ; Moore's Law ; Parallel processing ; Power dissipation ; Yarn</subject><ispartof>2009 IEEE 7th Symposium on Application Specific Processors, 2009, p.35-42</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c289t-9a8cf9919a30c857c63a412bef295a25440f711457f14ea5007e14ef7eb96fd43</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5226333$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2052,27902,54895</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5226333$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Papakonstantinou, A.</creatorcontrib><creatorcontrib>Gururaj, K.</creatorcontrib><creatorcontrib>Stratton, J.A.</creatorcontrib><creatorcontrib>Chen, D.</creatorcontrib><creatorcontrib>Cong, J.</creatorcontrib><creatorcontrib>Hwu, W.-M.W.</creatorcontrib><title>FCUDA: Enabling efficient compilation of CUDA kernels onto FPGAs</title><title>2009 IEEE 7th Symposium on Application Specific Processors</title><addtitle>SASP</addtitle><description>As growing power dissipation and thermal effects disrupted the rising clock frequency trend and threatened to annul Moore's law, the computing industry has switched its route to higher performance through parallel processing. The rise of multicore systems in all domains of computing has opened the door to heterogeneous multiprocessor, where processors of different compute characteristics can be combined to effectively boost the performance per watt of different application kernels. GPUs and FPGAs are becoming very popular in PC-based heterogeneous systems for speeding up compute intensive kernels of scientific, imaging and simulation applications. GPUs can execute hundreds of concurrent threads, while FPGAs provide customized concurrency for highly parallel kernels. However, exploiting the parallelism available in these applications is currently not a push-button task. Often the programmer has to expose the application's fine and coarse grained parallelism by using special APIs. CUDA is such a parallel computing API that is driven by the GPU industry and is gaining significant popularity. In this work, we adapt the CUDA programming model into a new FPGA design flow called FCUDA, which efficiently maps the coarse and fine grained parallelism exposed in CUDA onto the reconfigurable fabric. Our CUDA-to-FPGA flow employs autopilot, an advanced high level synthesis tool which enables high abstraction FPGA programming. FCUDA is based on a source-to-source compilation that transforms the SPMD CUDA thread blocks into parallel C code for autopilot. We describe the details of our CUDA-to-FPGA flow and demonstrate the highly competitive performance of the resulting customized FPGA multi-core accelerators. To the best of our knowledge, this is the first CUDA-to-FPGA flow to demonstrate the applicability and potential advantage of using the CUDA programming model for high-performance computing in FPGAs.</description><subject>Clocks</subject><subject>Computer industry</subject><subject>Concurrent computing</subject><subject>Field programmable gate arrays</subject><subject>Frequency</subject><subject>Kernel</subject><subject>Moore's Law</subject><subject>Parallel processing</subject><subject>Power dissipation</subject><subject>Yarn</subject><isbn>1424449391</isbn><isbn>9781424449392</isbn><isbn>1424449383</isbn><isbn>9781424449385</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2009</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNpFj91Kw0AUhFekoK19APFmXyDx7F-S45UhNlUoWKi9Lpv1rKymm5Lkxre3pQXnZmbgY2AYuxeQCgH4uCk361QCYGqkzJRSV2wqtNRaoyrU9X9BMWHTE4iQSRA3bD4M33CUNkoUcMue62r7Uj7xRbRNG-IXJ--DCxRH7rr9IbR2DF3knecnjv9QH6kdeBfHjtfrZTncsYm37UDzi8_Ytl58VK_J6n35VpWrxMkCxwRt4TyiQKvAFSZ3mbJayIa8RGOl0Rp8LoQ2uRearAHI6Rh8Tg1m_lOrGXs47wYi2h36sLf97-5yX_0BEIhJlg</recordid><startdate>200907</startdate><enddate>200907</enddate><creator>Papakonstantinou, A.</creator><creator>Gururaj, K.</creator><creator>Stratton, J.A.</creator><creator>Chen, D.</creator><creator>Cong, J.</creator><creator>Hwu, W.-M.W.</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>200907</creationdate><title>FCUDA: Enabling efficient compilation of CUDA kernels onto FPGAs</title><author>Papakonstantinou, A. ; Gururaj, K. ; Stratton, J.A. ; Chen, D. ; Cong, J. ; Hwu, W.-M.W.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c289t-9a8cf9919a30c857c63a412bef295a25440f711457f14ea5007e14ef7eb96fd43</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2009</creationdate><topic>Clocks</topic><topic>Computer industry</topic><topic>Concurrent computing</topic><topic>Field programmable gate arrays</topic><topic>Frequency</topic><topic>Kernel</topic><topic>Moore's Law</topic><topic>Parallel processing</topic><topic>Power dissipation</topic><topic>Yarn</topic><toplevel>online_resources</toplevel><creatorcontrib>Papakonstantinou, A.</creatorcontrib><creatorcontrib>Gururaj, K.</creatorcontrib><creatorcontrib>Stratton, J.A.</creatorcontrib><creatorcontrib>Chen, D.</creatorcontrib><creatorcontrib>Cong, J.</creatorcontrib><creatorcontrib>Hwu, W.-M.W.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Xplore</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Papakonstantinou, A.</au><au>Gururaj, K.</au><au>Stratton, J.A.</au><au>Chen, D.</au><au>Cong, J.</au><au>Hwu, W.-M.W.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>FCUDA: Enabling efficient compilation of CUDA kernels onto FPGAs</atitle><btitle>2009 IEEE 7th Symposium on Application Specific Processors</btitle><stitle>SASP</stitle><date>2009-07</date><risdate>2009</risdate><spage>35</spage><epage>42</epage><pages>35-42</pages><isbn>1424449391</isbn><isbn>9781424449392</isbn><eisbn>1424449383</eisbn><eisbn>9781424449385</eisbn><abstract>As growing power dissipation and thermal effects disrupted the rising clock frequency trend and threatened to annul Moore's law, the computing industry has switched its route to higher performance through parallel processing. The rise of multicore systems in all domains of computing has opened the door to heterogeneous multiprocessor, where processors of different compute characteristics can be combined to effectively boost the performance per watt of different application kernels. GPUs and FPGAs are becoming very popular in PC-based heterogeneous systems for speeding up compute intensive kernels of scientific, imaging and simulation applications. GPUs can execute hundreds of concurrent threads, while FPGAs provide customized concurrency for highly parallel kernels. However, exploiting the parallelism available in these applications is currently not a push-button task. Often the programmer has to expose the application's fine and coarse grained parallelism by using special APIs. CUDA is such a parallel computing API that is driven by the GPU industry and is gaining significant popularity. In this work, we adapt the CUDA programming model into a new FPGA design flow called FCUDA, which efficiently maps the coarse and fine grained parallelism exposed in CUDA onto the reconfigurable fabric. Our CUDA-to-FPGA flow employs autopilot, an advanced high level synthesis tool which enables high abstraction FPGA programming. FCUDA is based on a source-to-source compilation that transforms the SPMD CUDA thread blocks into parallel C code for autopilot. We describe the details of our CUDA-to-FPGA flow and demonstrate the highly competitive performance of the resulting customized FPGA multi-core accelerators. To the best of our knowledge, this is the first CUDA-to-FPGA flow to demonstrate the applicability and potential advantage of using the CUDA programming model for high-performance computing in FPGAs.</abstract><pub>IEEE</pub><doi>10.1109/SASP.2009.5226333</doi><tpages>8</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISBN: 1424449391
ispartof 2009 IEEE 7th Symposium on Application Specific Processors, 2009, p.35-42
issn
language eng
recordid cdi_ieee_primary_5226333
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Clocks
Computer industry
Concurrent computing
Field programmable gate arrays
Frequency
Kernel
Moore's Law
Parallel processing
Power dissipation
Yarn
title FCUDA: Enabling efficient compilation of CUDA kernels onto FPGAs
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T20%3A03%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=FCUDA:%20Enabling%20efficient%20compilation%20of%20CUDA%20kernels%20onto%20FPGAs&rft.btitle=2009%20IEEE%207th%20Symposium%20on%20Application%20Specific%20Processors&rft.au=Papakonstantinou,%20A.&rft.date=2009-07&rft.spage=35&rft.epage=42&rft.pages=35-42&rft.isbn=1424449391&rft.isbn_list=9781424449392&rft_id=info:doi/10.1109/SASP.2009.5226333&rft.eisbn=1424449383&rft.eisbn_list=9781424449385&rft_dat=%3Cieee_6IE%3E5226333%3C/ieee_6IE%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c289t-9a8cf9919a30c857c63a412bef295a25440f711457f14ea5007e14ef7eb96fd43%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=5226333&rfr_iscdi=true