Loading…
Divergence analysis
Growing interest in graphics processing units has brought renewed attention to the Single Instruction Multiple Data (SIMD) execution model. SIMD machines give application developers tremendous computational power; however, programming them is still challenging. In particular, developers must deal wi...
Saved in:
Published in: | ACM transactions on programming languages and systems 2013-12, Vol.35 (4), p.1-36, Article 13 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-a374t-1c1e6eb7a3f415f581c4e76dc68f883424b38fd3b8c90ded43c512ac88db5e4f3 |
---|---|
cites | cdi_FETCH-LOGICAL-a374t-1c1e6eb7a3f415f581c4e76dc68f883424b38fd3b8c90ded43c512ac88db5e4f3 |
container_end_page | 36 |
container_issue | 4 |
container_start_page | 1 |
container_title | ACM transactions on programming languages and systems |
container_volume | 35 |
creator | Sampaio, Diogo Souza, Rafael Martins de Collange, Caroline Pereira, Fernando Magno Quintão |
description | Growing interest in graphics processing units has brought renewed attention to the Single Instruction Multiple Data (SIMD) execution model. SIMD machines give application developers tremendous computational power; however, programming them is still challenging. In particular, developers must deal with memory and control-flow divergences. These phenomena stem from a condition that we call data divergence, which occurs whenever two processing elements (PEs) see the same variable name holding different values. This article introduces divergence analysis, a static analysis that discovers data divergences. This analysis, currently deployed in an industrial quality compiler, is useful in several ways: it improves the translation of SIMD code to non-SIMD CPUs, it helps developers to manually improve their SIMD applications, and it also guides the automatic optimization of SIMD programs. We demonstrate this last point by introducing the notion of a divergence-aware register spiller. This spiller uses information from our analysis to either rematerialize or share common data between PEs. As a testimony of its effectiveness, we have tested it on a suite of 395 CUDA kernels from well-known benchmarks. The divergence-aware spiller produces GPU code that is 26.21% faster than the code produced by the register allocator used in the baseline compiler. |
doi_str_mv | 10.1145/2523815 |
format | article |
fullrecord | <record><control><sourceid>proquest_hal_p</sourceid><recordid>TN_cdi_hal_primary_oai_HAL_hal_00909072v3</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1671579834</sourcerecordid><originalsourceid>FETCH-LOGICAL-a374t-1c1e6eb7a3f415f581c4e76dc68f883424b38fd3b8c90ded43c512ac88db5e4f3</originalsourceid><addsrcrecordid>eNpF0E1Lw0AQBuBFFKxVPHjz5EXUQ3RnP5LJsdSPCgUvel4mm12NpEndbQv9901JqMxhYObhHRjGLoE_Aij9JLSQCPqIjUBrTJTO5TEbcUhVwnOhT9lZjL-cc0CNI3b1XG1c-HaNdTfUUL2NVTxnJ57q6C6GPmZfry-f01ky_3h7n07mCclMrRKw4FJXZCS9Au01glUuS0ubokeUSqhCoi9lgTbnpSuVtBoEWcSy0E55OWYPfe4P1WYZqgWFrWmpMrPJ3OxnnOddZWIjO3vf22Vo_9YursyiitbVNTWuXUcDaQY6y7uzHb3rqQ1tjMH5QzZws3-RGV7UydshlKKl2gdqbBUPXCCkUoi8c9e9I7v43w4hO6_wakU</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1671579834</pqid></control><display><type>article</type><title>Divergence analysis</title><source>Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list)</source><source>BSC - Ebsco (Business Source Ultimate)</source><creator>Sampaio, Diogo ; Souza, Rafael Martins de ; Collange, Caroline ; Pereira, Fernando Magno Quintão</creator><creatorcontrib>Sampaio, Diogo ; Souza, Rafael Martins de ; Collange, Caroline ; Pereira, Fernando Magno Quintão</creatorcontrib><description>Growing interest in graphics processing units has brought renewed attention to the Single Instruction Multiple Data (SIMD) execution model. SIMD machines give application developers tremendous computational power; however, programming them is still challenging. In particular, developers must deal with memory and control-flow divergences. These phenomena stem from a condition that we call data divergence, which occurs whenever two processing elements (PEs) see the same variable name holding different values. This article introduces divergence analysis, a static analysis that discovers data divergences. This analysis, currently deployed in an industrial quality compiler, is useful in several ways: it improves the translation of SIMD code to non-SIMD CPUs, it helps developers to manually improve their SIMD applications, and it also guides the automatic optimization of SIMD programs. We demonstrate this last point by introducing the notion of a divergence-aware register spiller. This spiller uses information from our analysis to either rematerialize or share common data between PEs. As a testimony of its effectiveness, we have tested it on a suite of 395 CUDA kernels from well-known benchmarks. The divergence-aware spiller produces GPU code that is 26.21% faster than the code produced by the register allocator used in the baseline compiler.</description><identifier>ISSN: 0164-0925</identifier><identifier>EISSN: 1558-4593</identifier><identifier>DOI: 10.1145/2523815</identifier><identifier>CODEN: ATPSDT</identifier><language>eng</language><publisher>New York, NY, USA: ACM</publisher><subject>Applied sciences ; Benchmarks ; Compilers ; Computer Science ; Computer science; control theory; systems ; Computer systems and distributed systems. User interface ; Computer systems performance. Reliability ; Developers ; Divergence ; Exact sciences and technology ; Hardware Architecture ; Optimization ; Programming ; Programming Languages ; Programming theory ; Registers ; Software ; Software and its engineering ; Software notations and tools ; Theoretical computing ; Translations</subject><ispartof>ACM transactions on programming languages and systems, 2013-12, Vol.35 (4), p.1-36, Article 13</ispartof><rights>ACM</rights><rights>2015 INIST-CNRS</rights><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a374t-1c1e6eb7a3f415f581c4e76dc68f883424b38fd3b8c90ded43c512ac88db5e4f3</citedby><cites>FETCH-LOGICAL-a374t-1c1e6eb7a3f415f581c4e76dc68f883424b38fd3b8c90ded43c512ac88db5e4f3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,314,780,784,885,27924,27925</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=28163229$$DView record in Pascal Francis$$Hfree_for_read</backlink><backlink>$$Uhttps://inria.hal.science/hal-00909072$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Sampaio, Diogo</creatorcontrib><creatorcontrib>Souza, Rafael Martins de</creatorcontrib><creatorcontrib>Collange, Caroline</creatorcontrib><creatorcontrib>Pereira, Fernando Magno Quintão</creatorcontrib><title>Divergence analysis</title><title>ACM transactions on programming languages and systems</title><addtitle>ACM TOPLAS</addtitle><description>Growing interest in graphics processing units has brought renewed attention to the Single Instruction Multiple Data (SIMD) execution model. SIMD machines give application developers tremendous computational power; however, programming them is still challenging. In particular, developers must deal with memory and control-flow divergences. These phenomena stem from a condition that we call data divergence, which occurs whenever two processing elements (PEs) see the same variable name holding different values. This article introduces divergence analysis, a static analysis that discovers data divergences. This analysis, currently deployed in an industrial quality compiler, is useful in several ways: it improves the translation of SIMD code to non-SIMD CPUs, it helps developers to manually improve their SIMD applications, and it also guides the automatic optimization of SIMD programs. We demonstrate this last point by introducing the notion of a divergence-aware register spiller. This spiller uses information from our analysis to either rematerialize or share common data between PEs. As a testimony of its effectiveness, we have tested it on a suite of 395 CUDA kernels from well-known benchmarks. The divergence-aware spiller produces GPU code that is 26.21% faster than the code produced by the register allocator used in the baseline compiler.</description><subject>Applied sciences</subject><subject>Benchmarks</subject><subject>Compilers</subject><subject>Computer Science</subject><subject>Computer science; control theory; systems</subject><subject>Computer systems and distributed systems. User interface</subject><subject>Computer systems performance. Reliability</subject><subject>Developers</subject><subject>Divergence</subject><subject>Exact sciences and technology</subject><subject>Hardware Architecture</subject><subject>Optimization</subject><subject>Programming</subject><subject>Programming Languages</subject><subject>Programming theory</subject><subject>Registers</subject><subject>Software</subject><subject>Software and its engineering</subject><subject>Software notations and tools</subject><subject>Theoretical computing</subject><subject>Translations</subject><issn>0164-0925</issn><issn>1558-4593</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2013</creationdate><recordtype>article</recordtype><recordid>eNpF0E1Lw0AQBuBFFKxVPHjz5EXUQ3RnP5LJsdSPCgUvel4mm12NpEndbQv9901JqMxhYObhHRjGLoE_Aij9JLSQCPqIjUBrTJTO5TEbcUhVwnOhT9lZjL-cc0CNI3b1XG1c-HaNdTfUUL2NVTxnJ57q6C6GPmZfry-f01ky_3h7n07mCclMrRKw4FJXZCS9Au01glUuS0ubokeUSqhCoi9lgTbnpSuVtBoEWcSy0E55OWYPfe4P1WYZqgWFrWmpMrPJ3OxnnOddZWIjO3vf22Vo_9YursyiitbVNTWuXUcDaQY6y7uzHb3rqQ1tjMH5QzZws3-RGV7UydshlKKl2gdqbBUPXCCkUoi8c9e9I7v43w4hO6_wakU</recordid><startdate>20131201</startdate><enddate>20131201</enddate><creator>Sampaio, Diogo</creator><creator>Souza, Rafael Martins de</creator><creator>Collange, Caroline</creator><creator>Pereira, Fernando Magno Quintão</creator><general>ACM</general><general>Association for Computing Machinery</general><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>1XC</scope><scope>VOOES</scope></search><sort><creationdate>20131201</creationdate><title>Divergence analysis</title><author>Sampaio, Diogo ; Souza, Rafael Martins de ; Collange, Caroline ; Pereira, Fernando Magno Quintão</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a374t-1c1e6eb7a3f415f581c4e76dc68f883424b38fd3b8c90ded43c512ac88db5e4f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Applied sciences</topic><topic>Benchmarks</topic><topic>Compilers</topic><topic>Computer Science</topic><topic>Computer science; control theory; systems</topic><topic>Computer systems and distributed systems. User interface</topic><topic>Computer systems performance. Reliability</topic><topic>Developers</topic><topic>Divergence</topic><topic>Exact sciences and technology</topic><topic>Hardware Architecture</topic><topic>Optimization</topic><topic>Programming</topic><topic>Programming Languages</topic><topic>Programming theory</topic><topic>Registers</topic><topic>Software</topic><topic>Software and its engineering</topic><topic>Software notations and tools</topic><topic>Theoretical computing</topic><topic>Translations</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sampaio, Diogo</creatorcontrib><creatorcontrib>Souza, Rafael Martins de</creatorcontrib><creatorcontrib>Collange, Caroline</creatorcontrib><creatorcontrib>Pereira, Fernando Magno Quintão</creatorcontrib><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Hyper Article en Ligne (HAL)</collection><collection>Hyper Article en Ligne (HAL) (Open Access)</collection><jtitle>ACM transactions on programming languages and systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sampaio, Diogo</au><au>Souza, Rafael Martins de</au><au>Collange, Caroline</au><au>Pereira, Fernando Magno Quintão</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Divergence analysis</atitle><jtitle>ACM transactions on programming languages and systems</jtitle><stitle>ACM TOPLAS</stitle><date>2013-12-01</date><risdate>2013</risdate><volume>35</volume><issue>4</issue><spage>1</spage><epage>36</epage><pages>1-36</pages><artnum>13</artnum><issn>0164-0925</issn><eissn>1558-4593</eissn><coden>ATPSDT</coden><abstract>Growing interest in graphics processing units has brought renewed attention to the Single Instruction Multiple Data (SIMD) execution model. SIMD machines give application developers tremendous computational power; however, programming them is still challenging. In particular, developers must deal with memory and control-flow divergences. These phenomena stem from a condition that we call data divergence, which occurs whenever two processing elements (PEs) see the same variable name holding different values. This article introduces divergence analysis, a static analysis that discovers data divergences. This analysis, currently deployed in an industrial quality compiler, is useful in several ways: it improves the translation of SIMD code to non-SIMD CPUs, it helps developers to manually improve their SIMD applications, and it also guides the automatic optimization of SIMD programs. We demonstrate this last point by introducing the notion of a divergence-aware register spiller. This spiller uses information from our analysis to either rematerialize or share common data between PEs. As a testimony of its effectiveness, we have tested it on a suite of 395 CUDA kernels from well-known benchmarks. The divergence-aware spiller produces GPU code that is 26.21% faster than the code produced by the register allocator used in the baseline compiler.</abstract><cop>New York, NY, USA</cop><pub>ACM</pub><doi>10.1145/2523815</doi><tpages>36</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0164-0925 |
ispartof | ACM transactions on programming languages and systems, 2013-12, Vol.35 (4), p.1-36, Article 13 |
issn | 0164-0925 1558-4593 |
language | eng |
recordid | cdi_hal_primary_oai_HAL_hal_00909072v3 |
source | Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list); BSC - Ebsco (Business Source Ultimate) |
subjects | Applied sciences Benchmarks Compilers Computer Science Computer science control theory systems Computer systems and distributed systems. User interface Computer systems performance. Reliability Developers Divergence Exact sciences and technology Hardware Architecture Optimization Programming Programming Languages Programming theory Registers Software Software and its engineering Software notations and tools Theoretical computing Translations |
title | Divergence analysis |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T10%3A59%3A09IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_hal_p&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Divergence%20analysis&rft.jtitle=ACM%20transactions%20on%20programming%20languages%20and%20systems&rft.au=Sampaio,%20Diogo&rft.date=2013-12-01&rft.volume=35&rft.issue=4&rft.spage=1&rft.epage=36&rft.pages=1-36&rft.artnum=13&rft.issn=0164-0925&rft.eissn=1558-4593&rft.coden=ATPSDT&rft_id=info:doi/10.1145/2523815&rft_dat=%3Cproquest_hal_p%3E1671579834%3C/proquest_hal_p%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-a374t-1c1e6eb7a3f415f581c4e76dc68f883424b38fd3b8c90ded43c512ac88db5e4f3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1671579834&rft_id=info:pmid/&rfr_iscdi=true |