Loading…

An improved parallelism scheme for deterministic discrete ordinates transport

In this paper we demonstrate techniques for increasing the node-level parallelism of a deterministic discrete ordinates neutral particle transport algorithm on a structured mesh to exploit many-core technologies. Transport calculations form a large part of the computational workload of physical simu...

Full description

Saved in:
Bibliographic Details
Published in:The International Journal of High Performance Computing Applications 2018-07, Vol.32 (4), p.555-569
Main Authors: Deakin, Tom, McIntosh-Smith, Simon, Martineau, Matt, Gaudin, Wayne
Format: Article
Language:English
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c323t-c6c49a2c1f3280e6c4f7eafba3670544be649779deac3b4acca38eb5018fd00f3
cites cdi_FETCH-LOGICAL-c323t-c6c49a2c1f3280e6c4f7eafba3670544be649779deac3b4acca38eb5018fd00f3
container_end_page 569
container_issue 4
container_start_page 555
container_title The International Journal of High Performance Computing Applications
container_volume 32
creator Deakin, Tom
McIntosh-Smith, Simon
Martineau, Matt
Gaudin, Wayne
description In this paper we demonstrate techniques for increasing the node-level parallelism of a deterministic discrete ordinates neutral particle transport algorithm on a structured mesh to exploit many-core technologies. Transport calculations form a large part of the computational workload of physical simulations and so good performance is vital for the simulations to complete in reasonable time. We will demonstrate our approach utilizing the SNAP mini-app, which gives a simplified implementation of the full transport algorithm but remains similar enough to the real algorithm to act as a useful proxy for research purposes. We present an OpenCL implementation of our improved algorithm which achieves a speedup of up to 2.5 × on a many-core GPGPU device compared to a state-of-the-art multi-core node for the transport sweep, and up to 4 × compared to the multi-core CPUs in the largest GPU enabled supercomputer; the first time this scale of speedup has been achieved for algorithms of this class. We then discuss ways to express our scheme in OpenMP 4.0 and demonstrate the performance on an Intel Knights Corner Xeon Phi compared to the original scheme.
doi_str_mv 10.1177/1094342016668978
format article
fullrecord <record><control><sourceid>sage_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1177_1094342016668978</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sage_id>10.1177_1094342016668978</sage_id><sourcerecordid>10.1177_1094342016668978</sourcerecordid><originalsourceid>FETCH-LOGICAL-c323t-c6c49a2c1f3280e6c4f7eafba3670544be649779deac3b4acca38eb5018fd00f3</originalsourceid><addsrcrecordid>eNp1UMtKxDAUDaLgOLp3mR-o3jRp0i6HwReMuNF1uU1vNENf3FTBv7fDuBJc3cd5cDhCXCu4Ucq5WwWV0SYHZa0tK1eeiJVyRmV5aezpsi9wdsDPxUVKewCwRhcr8bwZZOwnHr-olRMydh11MfUy-Q_qSYaRZUszcR-HmOboZRuT5-UjR27jgDMlOTMOaRp5vhRnAbtEV79zLd7u7163j9nu5eFpu9llXud6zrz1psLcq6DzEmi5giMMDWrroDCmIWsq56qW0OvGoPeoS2oKUGVoAYJeCzj6eh5TYgr1xLFH_q4V1Ic66r91LJLsKEn4TvV-_ORhSfg__wfPZ2Gb</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>An improved parallelism scheme for deterministic discrete ordinates transport</title><source>Sage Journals Online</source><creator>Deakin, Tom ; McIntosh-Smith, Simon ; Martineau, Matt ; Gaudin, Wayne</creator><creatorcontrib>Deakin, Tom ; McIntosh-Smith, Simon ; Martineau, Matt ; Gaudin, Wayne</creatorcontrib><description>In this paper we demonstrate techniques for increasing the node-level parallelism of a deterministic discrete ordinates neutral particle transport algorithm on a structured mesh to exploit many-core technologies. Transport calculations form a large part of the computational workload of physical simulations and so good performance is vital for the simulations to complete in reasonable time. We will demonstrate our approach utilizing the SNAP mini-app, which gives a simplified implementation of the full transport algorithm but remains similar enough to the real algorithm to act as a useful proxy for research purposes. We present an OpenCL implementation of our improved algorithm which achieves a speedup of up to 2.5 × on a many-core GPGPU device compared to a state-of-the-art multi-core node for the transport sweep, and up to 4 × compared to the multi-core CPUs in the largest GPU enabled supercomputer; the first time this scale of speedup has been achieved for algorithms of this class. We then discuss ways to express our scheme in OpenMP 4.0 and demonstrate the performance on an Intel Knights Corner Xeon Phi compared to the original scheme.</description><identifier>ISSN: 1094-3420</identifier><identifier>EISSN: 1741-2846</identifier><identifier>DOI: 10.1177/1094342016668978</identifier><language>eng</language><publisher>London, England: SAGE Publications</publisher><ispartof>The International Journal of High Performance Computing Applications, 2018-07, Vol.32 (4), p.555-569</ispartof><rights>The Author(s) 2016</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c323t-c6c49a2c1f3280e6c4f7eafba3670544be649779deac3b4acca38eb5018fd00f3</citedby><cites>FETCH-LOGICAL-c323t-c6c49a2c1f3280e6c4f7eafba3670544be649779deac3b4acca38eb5018fd00f3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>313,314,780,784,792,27922,27924,27925,79364</link.rule.ids></links><search><creatorcontrib>Deakin, Tom</creatorcontrib><creatorcontrib>McIntosh-Smith, Simon</creatorcontrib><creatorcontrib>Martineau, Matt</creatorcontrib><creatorcontrib>Gaudin, Wayne</creatorcontrib><title>An improved parallelism scheme for deterministic discrete ordinates transport</title><title>The International Journal of High Performance Computing Applications</title><description>In this paper we demonstrate techniques for increasing the node-level parallelism of a deterministic discrete ordinates neutral particle transport algorithm on a structured mesh to exploit many-core technologies. Transport calculations form a large part of the computational workload of physical simulations and so good performance is vital for the simulations to complete in reasonable time. We will demonstrate our approach utilizing the SNAP mini-app, which gives a simplified implementation of the full transport algorithm but remains similar enough to the real algorithm to act as a useful proxy for research purposes. We present an OpenCL implementation of our improved algorithm which achieves a speedup of up to 2.5 × on a many-core GPGPU device compared to a state-of-the-art multi-core node for the transport sweep, and up to 4 × compared to the multi-core CPUs in the largest GPU enabled supercomputer; the first time this scale of speedup has been achieved for algorithms of this class. We then discuss ways to express our scheme in OpenMP 4.0 and demonstrate the performance on an Intel Knights Corner Xeon Phi compared to the original scheme.</description><issn>1094-3420</issn><issn>1741-2846</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><recordid>eNp1UMtKxDAUDaLgOLp3mR-o3jRp0i6HwReMuNF1uU1vNENf3FTBv7fDuBJc3cd5cDhCXCu4Ucq5WwWV0SYHZa0tK1eeiJVyRmV5aezpsi9wdsDPxUVKewCwRhcr8bwZZOwnHr-olRMydh11MfUy-Q_qSYaRZUszcR-HmOboZRuT5-UjR27jgDMlOTMOaRp5vhRnAbtEV79zLd7u7163j9nu5eFpu9llXud6zrz1psLcq6DzEmi5giMMDWrroDCmIWsq56qW0OvGoPeoS2oKUGVoAYJeCzj6eh5TYgr1xLFH_q4V1Ic66r91LJLsKEn4TvV-_ORhSfg__wfPZ2Gb</recordid><startdate>201807</startdate><enddate>201807</enddate><creator>Deakin, Tom</creator><creator>McIntosh-Smith, Simon</creator><creator>Martineau, Matt</creator><creator>Gaudin, Wayne</creator><general>SAGE Publications</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>201807</creationdate><title>An improved parallelism scheme for deterministic discrete ordinates transport</title><author>Deakin, Tom ; McIntosh-Smith, Simon ; Martineau, Matt ; Gaudin, Wayne</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c323t-c6c49a2c1f3280e6c4f7eafba3670544be649779deac3b4acca38eb5018fd00f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Deakin, Tom</creatorcontrib><creatorcontrib>McIntosh-Smith, Simon</creatorcontrib><creatorcontrib>Martineau, Matt</creatorcontrib><creatorcontrib>Gaudin, Wayne</creatorcontrib><collection>CrossRef</collection><jtitle>The International Journal of High Performance Computing Applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Deakin, Tom</au><au>McIntosh-Smith, Simon</au><au>Martineau, Matt</au><au>Gaudin, Wayne</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An improved parallelism scheme for deterministic discrete ordinates transport</atitle><jtitle>The International Journal of High Performance Computing Applications</jtitle><date>2018-07</date><risdate>2018</risdate><volume>32</volume><issue>4</issue><spage>555</spage><epage>569</epage><pages>555-569</pages><issn>1094-3420</issn><eissn>1741-2846</eissn><abstract>In this paper we demonstrate techniques for increasing the node-level parallelism of a deterministic discrete ordinates neutral particle transport algorithm on a structured mesh to exploit many-core technologies. Transport calculations form a large part of the computational workload of physical simulations and so good performance is vital for the simulations to complete in reasonable time. We will demonstrate our approach utilizing the SNAP mini-app, which gives a simplified implementation of the full transport algorithm but remains similar enough to the real algorithm to act as a useful proxy for research purposes. We present an OpenCL implementation of our improved algorithm which achieves a speedup of up to 2.5 × on a many-core GPGPU device compared to a state-of-the-art multi-core node for the transport sweep, and up to 4 × compared to the multi-core CPUs in the largest GPU enabled supercomputer; the first time this scale of speedup has been achieved for algorithms of this class. We then discuss ways to express our scheme in OpenMP 4.0 and demonstrate the performance on an Intel Knights Corner Xeon Phi compared to the original scheme.</abstract><cop>London, England</cop><pub>SAGE Publications</pub><doi>10.1177/1094342016668978</doi><tpages>15</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1094-3420
ispartof The International Journal of High Performance Computing Applications, 2018-07, Vol.32 (4), p.555-569
issn 1094-3420
1741-2846
language eng
recordid cdi_crossref_primary_10_1177_1094342016668978
source Sage Journals Online
title An improved parallelism scheme for deterministic discrete ordinates transport
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T02%3A36%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-sage_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20improved%20parallelism%20scheme%20for%20deterministic%20discrete%20ordinates%20transport&rft.jtitle=The%20International%20Journal%20of%20High%20Performance%20Computing%20Applications&rft.au=Deakin,%20Tom&rft.date=2018-07&rft.volume=32&rft.issue=4&rft.spage=555&rft.epage=569&rft.pages=555-569&rft.issn=1094-3420&rft.eissn=1741-2846&rft_id=info:doi/10.1177/1094342016668978&rft_dat=%3Csage_cross%3E10.1177_1094342016668978%3C/sage_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c323t-c6c49a2c1f3280e6c4f7eafba3670544be649779deac3b4acca38eb5018fd00f3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_sage_id=10.1177_1094342016668978&rfr_iscdi=true