Loading…

Divergence analysis

Growing interest in graphics processing units has brought renewed attention to the Single Instruction Multiple Data (SIMD) execution model. SIMD machines give application developers tremendous computational power; however, programming them is still challenging. In particular, developers must deal wi...

Full description

Saved in:
Bibliographic Details
Published in:ACM transactions on programming languages and systems 2013-12, Vol.35 (4), p.1-36, Article 13
Main Authors: Sampaio, Diogo, Souza, Rafael Martins de, Collange, Caroline, Pereira, Fernando Magno Quintão
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-a374t-1c1e6eb7a3f415f581c4e76dc68f883424b38fd3b8c90ded43c512ac88db5e4f3
cites cdi_FETCH-LOGICAL-a374t-1c1e6eb7a3f415f581c4e76dc68f883424b38fd3b8c90ded43c512ac88db5e4f3
container_end_page 36
container_issue 4
container_start_page 1
container_title ACM transactions on programming languages and systems
container_volume 35
creator Sampaio, Diogo
Souza, Rafael Martins de
Collange, Caroline
Pereira, Fernando Magno Quintão
description Growing interest in graphics processing units has brought renewed attention to the Single Instruction Multiple Data (SIMD) execution model. SIMD machines give application developers tremendous computational power; however, programming them is still challenging. In particular, developers must deal with memory and control-flow divergences. These phenomena stem from a condition that we call data divergence, which occurs whenever two processing elements (PEs) see the same variable name holding different values. This article introduces divergence analysis, a static analysis that discovers data divergences. This analysis, currently deployed in an industrial quality compiler, is useful in several ways: it improves the translation of SIMD code to non-SIMD CPUs, it helps developers to manually improve their SIMD applications, and it also guides the automatic optimization of SIMD programs. We demonstrate this last point by introducing the notion of a divergence-aware register spiller. This spiller uses information from our analysis to either rematerialize or share common data between PEs. As a testimony of its effectiveness, we have tested it on a suite of 395 CUDA kernels from well-known benchmarks. The divergence-aware spiller produces GPU code that is 26.21% faster than the code produced by the register allocator used in the baseline compiler.
doi_str_mv 10.1145/2523815
format article
fullrecord <record><control><sourceid>proquest_hal_p</sourceid><recordid>TN_cdi_hal_primary_oai_HAL_hal_00909072v3</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1671579834</sourcerecordid><originalsourceid>FETCH-LOGICAL-a374t-1c1e6eb7a3f415f581c4e76dc68f883424b38fd3b8c90ded43c512ac88db5e4f3</originalsourceid><addsrcrecordid>eNpF0E1Lw0AQBuBFFKxVPHjz5EXUQ3RnP5LJsdSPCgUvel4mm12NpEndbQv9901JqMxhYObhHRjGLoE_Aij9JLSQCPqIjUBrTJTO5TEbcUhVwnOhT9lZjL-cc0CNI3b1XG1c-HaNdTfUUL2NVTxnJ57q6C6GPmZfry-f01ky_3h7n07mCclMrRKw4FJXZCS9Au01glUuS0ubokeUSqhCoi9lgTbnpSuVtBoEWcSy0E55OWYPfe4P1WYZqgWFrWmpMrPJ3OxnnOddZWIjO3vf22Vo_9YursyiitbVNTWuXUcDaQY6y7uzHb3rqQ1tjMH5QzZws3-RGV7UydshlKKl2gdqbBUPXCCkUoi8c9e9I7v43w4hO6_wakU</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1671579834</pqid></control><display><type>article</type><title>Divergence analysis</title><source>Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list)</source><source>BSC - Ebsco (Business Source Ultimate)</source><creator>Sampaio, Diogo ; Souza, Rafael Martins de ; Collange, Caroline ; Pereira, Fernando Magno Quintão</creator><creatorcontrib>Sampaio, Diogo ; Souza, Rafael Martins de ; Collange, Caroline ; Pereira, Fernando Magno Quintão</creatorcontrib><description>Growing interest in graphics processing units has brought renewed attention to the Single Instruction Multiple Data (SIMD) execution model. SIMD machines give application developers tremendous computational power; however, programming them is still challenging. In particular, developers must deal with memory and control-flow divergences. These phenomena stem from a condition that we call data divergence, which occurs whenever two processing elements (PEs) see the same variable name holding different values. This article introduces divergence analysis, a static analysis that discovers data divergences. This analysis, currently deployed in an industrial quality compiler, is useful in several ways: it improves the translation of SIMD code to non-SIMD CPUs, it helps developers to manually improve their SIMD applications, and it also guides the automatic optimization of SIMD programs. We demonstrate this last point by introducing the notion of a divergence-aware register spiller. This spiller uses information from our analysis to either rematerialize or share common data between PEs. As a testimony of its effectiveness, we have tested it on a suite of 395 CUDA kernels from well-known benchmarks. The divergence-aware spiller produces GPU code that is 26.21% faster than the code produced by the register allocator used in the baseline compiler.</description><identifier>ISSN: 0164-0925</identifier><identifier>EISSN: 1558-4593</identifier><identifier>DOI: 10.1145/2523815</identifier><identifier>CODEN: ATPSDT</identifier><language>eng</language><publisher>New York, NY, USA: ACM</publisher><subject>Applied sciences ; Benchmarks ; Compilers ; Computer Science ; Computer science; control theory; systems ; Computer systems and distributed systems. User interface ; Computer systems performance. Reliability ; Developers ; Divergence ; Exact sciences and technology ; Hardware Architecture ; Optimization ; Programming ; Programming Languages ; Programming theory ; Registers ; Software ; Software and its engineering ; Software notations and tools ; Theoretical computing ; Translations</subject><ispartof>ACM transactions on programming languages and systems, 2013-12, Vol.35 (4), p.1-36, Article 13</ispartof><rights>ACM</rights><rights>2015 INIST-CNRS</rights><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a374t-1c1e6eb7a3f415f581c4e76dc68f883424b38fd3b8c90ded43c512ac88db5e4f3</citedby><cites>FETCH-LOGICAL-a374t-1c1e6eb7a3f415f581c4e76dc68f883424b38fd3b8c90ded43c512ac88db5e4f3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,314,780,784,885,27924,27925</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=28163229$$DView record in Pascal Francis$$Hfree_for_read</backlink><backlink>$$Uhttps://inria.hal.science/hal-00909072$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Sampaio, Diogo</creatorcontrib><creatorcontrib>Souza, Rafael Martins de</creatorcontrib><creatorcontrib>Collange, Caroline</creatorcontrib><creatorcontrib>Pereira, Fernando Magno Quintão</creatorcontrib><title>Divergence analysis</title><title>ACM transactions on programming languages and systems</title><addtitle>ACM TOPLAS</addtitle><description>Growing interest in graphics processing units has brought renewed attention to the Single Instruction Multiple Data (SIMD) execution model. SIMD machines give application developers tremendous computational power; however, programming them is still challenging. In particular, developers must deal with memory and control-flow divergences. These phenomena stem from a condition that we call data divergence, which occurs whenever two processing elements (PEs) see the same variable name holding different values. This article introduces divergence analysis, a static analysis that discovers data divergences. This analysis, currently deployed in an industrial quality compiler, is useful in several ways: it improves the translation of SIMD code to non-SIMD CPUs, it helps developers to manually improve their SIMD applications, and it also guides the automatic optimization of SIMD programs. We demonstrate this last point by introducing the notion of a divergence-aware register spiller. This spiller uses information from our analysis to either rematerialize or share common data between PEs. As a testimony of its effectiveness, we have tested it on a suite of 395 CUDA kernels from well-known benchmarks. The divergence-aware spiller produces GPU code that is 26.21% faster than the code produced by the register allocator used in the baseline compiler.</description><subject>Applied sciences</subject><subject>Benchmarks</subject><subject>Compilers</subject><subject>Computer Science</subject><subject>Computer science; control theory; systems</subject><subject>Computer systems and distributed systems. User interface</subject><subject>Computer systems performance. Reliability</subject><subject>Developers</subject><subject>Divergence</subject><subject>Exact sciences and technology</subject><subject>Hardware Architecture</subject><subject>Optimization</subject><subject>Programming</subject><subject>Programming Languages</subject><subject>Programming theory</subject><subject>Registers</subject><subject>Software</subject><subject>Software and its engineering</subject><subject>Software notations and tools</subject><subject>Theoretical computing</subject><subject>Translations</subject><issn>0164-0925</issn><issn>1558-4593</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2013</creationdate><recordtype>article</recordtype><recordid>eNpF0E1Lw0AQBuBFFKxVPHjz5EXUQ3RnP5LJsdSPCgUvel4mm12NpEndbQv9901JqMxhYObhHRjGLoE_Aij9JLSQCPqIjUBrTJTO5TEbcUhVwnOhT9lZjL-cc0CNI3b1XG1c-HaNdTfUUL2NVTxnJ57q6C6GPmZfry-f01ky_3h7n07mCclMrRKw4FJXZCS9Au01glUuS0ubokeUSqhCoi9lgTbnpSuVtBoEWcSy0E55OWYPfe4P1WYZqgWFrWmpMrPJ3OxnnOddZWIjO3vf22Vo_9YursyiitbVNTWuXUcDaQY6y7uzHb3rqQ1tjMH5QzZws3-RGV7UydshlKKl2gdqbBUPXCCkUoi8c9e9I7v43w4hO6_wakU</recordid><startdate>20131201</startdate><enddate>20131201</enddate><creator>Sampaio, Diogo</creator><creator>Souza, Rafael Martins de</creator><creator>Collange, Caroline</creator><creator>Pereira, Fernando Magno Quintão</creator><general>ACM</general><general>Association for Computing Machinery</general><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>1XC</scope><scope>VOOES</scope></search><sort><creationdate>20131201</creationdate><title>Divergence analysis</title><author>Sampaio, Diogo ; Souza, Rafael Martins de ; Collange, Caroline ; Pereira, Fernando Magno Quintão</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a374t-1c1e6eb7a3f415f581c4e76dc68f883424b38fd3b8c90ded43c512ac88db5e4f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Applied sciences</topic><topic>Benchmarks</topic><topic>Compilers</topic><topic>Computer Science</topic><topic>Computer science; control theory; systems</topic><topic>Computer systems and distributed systems. User interface</topic><topic>Computer systems performance. Reliability</topic><topic>Developers</topic><topic>Divergence</topic><topic>Exact sciences and technology</topic><topic>Hardware Architecture</topic><topic>Optimization</topic><topic>Programming</topic><topic>Programming Languages</topic><topic>Programming theory</topic><topic>Registers</topic><topic>Software</topic><topic>Software and its engineering</topic><topic>Software notations and tools</topic><topic>Theoretical computing</topic><topic>Translations</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sampaio, Diogo</creatorcontrib><creatorcontrib>Souza, Rafael Martins de</creatorcontrib><creatorcontrib>Collange, Caroline</creatorcontrib><creatorcontrib>Pereira, Fernando Magno Quintão</creatorcontrib><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Hyper Article en Ligne (HAL)</collection><collection>Hyper Article en Ligne (HAL) (Open Access)</collection><jtitle>ACM transactions on programming languages and systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sampaio, Diogo</au><au>Souza, Rafael Martins de</au><au>Collange, Caroline</au><au>Pereira, Fernando Magno Quintão</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Divergence analysis</atitle><jtitle>ACM transactions on programming languages and systems</jtitle><stitle>ACM TOPLAS</stitle><date>2013-12-01</date><risdate>2013</risdate><volume>35</volume><issue>4</issue><spage>1</spage><epage>36</epage><pages>1-36</pages><artnum>13</artnum><issn>0164-0925</issn><eissn>1558-4593</eissn><coden>ATPSDT</coden><abstract>Growing interest in graphics processing units has brought renewed attention to the Single Instruction Multiple Data (SIMD) execution model. SIMD machines give application developers tremendous computational power; however, programming them is still challenging. In particular, developers must deal with memory and control-flow divergences. These phenomena stem from a condition that we call data divergence, which occurs whenever two processing elements (PEs) see the same variable name holding different values. This article introduces divergence analysis, a static analysis that discovers data divergences. This analysis, currently deployed in an industrial quality compiler, is useful in several ways: it improves the translation of SIMD code to non-SIMD CPUs, it helps developers to manually improve their SIMD applications, and it also guides the automatic optimization of SIMD programs. We demonstrate this last point by introducing the notion of a divergence-aware register spiller. This spiller uses information from our analysis to either rematerialize or share common data between PEs. As a testimony of its effectiveness, we have tested it on a suite of 395 CUDA kernels from well-known benchmarks. The divergence-aware spiller produces GPU code that is 26.21% faster than the code produced by the register allocator used in the baseline compiler.</abstract><cop>New York, NY, USA</cop><pub>ACM</pub><doi>10.1145/2523815</doi><tpages>36</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0164-0925
ispartof ACM transactions on programming languages and systems, 2013-12, Vol.35 (4), p.1-36, Article 13
issn 0164-0925
1558-4593
language eng
recordid cdi_hal_primary_oai_HAL_hal_00909072v3
source Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list); BSC - Ebsco (Business Source Ultimate)
subjects Applied sciences
Benchmarks
Compilers
Computer Science
Computer science
control theory
systems
Computer systems and distributed systems. User interface
Computer systems performance. Reliability
Developers
Divergence
Exact sciences and technology
Hardware Architecture
Optimization
Programming
Programming Languages
Programming theory
Registers
Software
Software and its engineering
Software notations and tools
Theoretical computing
Translations
title Divergence analysis
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T10%3A59%3A09IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_hal_p&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Divergence%20analysis&rft.jtitle=ACM%20transactions%20on%20programming%20languages%20and%20systems&rft.au=Sampaio,%20Diogo&rft.date=2013-12-01&rft.volume=35&rft.issue=4&rft.spage=1&rft.epage=36&rft.pages=1-36&rft.artnum=13&rft.issn=0164-0925&rft.eissn=1558-4593&rft.coden=ATPSDT&rft_id=info:doi/10.1145/2523815&rft_dat=%3Cproquest_hal_p%3E1671579834%3C/proquest_hal_p%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-a374t-1c1e6eb7a3f415f581c4e76dc68f883424b38fd3b8c90ded43c512ac88db5e4f3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1671579834&rft_id=info:pmid/&rfr_iscdi=true