Loading…

Design and initial performance of a high-level unstructured mesh framework on heterogeneous parallel systems

•Discuss the main design issues in parallelizing unstructured mesh applications.•Present OP2 for developing applications for heterogeneous parallel systems.•Analyze the performance gained with OP2 for two industrial-representative benchmarks.•Compare runtime, scaling and runtime break-downs of the a...

Full description

Saved in:
Bibliographic Details
Published in:Parallel computing 2013-11, Vol.39 (11), p.669-692
Main Authors: Mudalige, G.R., Giles, M.B., Thiyagalingam, J., Reguly, I.Z., Bertolli, C., Kelly, P.H.J., Trefethen, A.E.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c369t-255e1c9b5c92edf97be3eb1a8e026dcf382854ae8fcb214f7102016e0d230eb53
cites cdi_FETCH-LOGICAL-c369t-255e1c9b5c92edf97be3eb1a8e026dcf382854ae8fcb214f7102016e0d230eb53
container_end_page 692
container_issue 11
container_start_page 669
container_title Parallel computing
container_volume 39
creator Mudalige, G.R.
Giles, M.B.
Thiyagalingam, J.
Reguly, I.Z.
Bertolli, C.
Kelly, P.H.J.
Trefethen, A.E.
description •Discuss the main design issues in parallelizing unstructured mesh applications.•Present OP2 for developing applications for heterogeneous parallel systems.•Analyze the performance gained with OP2 for two industrial-representative benchmarks.•Compare runtime, scaling and runtime break-downs of the applications.•Present energy consumption of OP2 applications on CPU and GPU clusters. OP2 is a high-level domain specific library framework for the solution of unstructured mesh-based applications. It utilizes source-to-source translation and compilation so that a single application code written using the OP2 API can be transformed into multiple parallel implementations for execution on a range of back-end hardware platforms. In this paper we present the design and performance of OP2’s recent developments facilitating code generation and execution on distributed memory heterogeneous systems. OP2 targets the solution of numerical problems based on static unstructured meshes. We discuss the main design issues in parallelizing this class of applications. These include handling data dependencies in accessing indirectly referenced data and design considerations in generating code for execution on a cluster of multi-threaded CPUs and GPUs. Two representative CFD applications, written using the OP2 framework, are utilized to provide a contrasting benchmarking and performance analysis study on a number of heterogeneous systems including a large scale Cray XE6 system and a large GPU cluster. A range of performance metrics are benchmarked including runtime, scalability, achieved compute and bandwidth performance, runtime bottlenecks and systems energy consumption. We demonstrate that an application written once at a high-level using OP2 is easily portable across a wide range of contrasting platforms and is capable of achieving near-optimal performance without the intervention of the domain application programmer.
doi_str_mv 10.1016/j.parco.2013.09.004
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1678001714</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0167819113001166</els_id><sourcerecordid>1506404490</sourcerecordid><originalsourceid>FETCH-LOGICAL-c369t-255e1c9b5c92edf97be3eb1a8e026dcf382854ae8fcb214f7102016e0d230eb53</originalsourceid><addsrcrecordid>eNqFkbtOwzAUhi0EEqXwBCweWRKO41wHBlSuEhILzJbjHLcuTlzsBNS3x22ZYTrL9x_9F0IuGaQMWHm9TjfSK5dmwHgKTQqQH5EZq6ssqTgvj8ksUlVSs4adkrMQ1gBQ5jXMiL3DYJYDlUNHzWBGIy3doNfO93JQSJ2mkq7McpVY_EJLpyGMflLj5LGjPYYV1V72-O38B3UDXeGI3i1xQDcFGk1Ja6MqbMOIfTgnJ1ragBe_d07eH-7fFk_Jy-vj8-L2JVG8bMYkKwpkqmkL1WTY6aZqkWPLZI2QlZ3SvM7qIpdYa9VmLNcVgxi8ROgyDtgWfE6uDn833n1OGEbRm6DQWrn3JWIXNQCrWP4_WsSiIM8biCg_oMq7EDxqsfGml34rGIjdDGIt9jOI3QwCGhFniKqbgwpj4C-DXgRlMFbbGY9qFJ0zf-p_AD-Sk8w</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1506404490</pqid></control><display><type>article</type><title>Design and initial performance of a high-level unstructured mesh framework on heterogeneous parallel systems</title><source>ScienceDirect Journals</source><creator>Mudalige, G.R. ; Giles, M.B. ; Thiyagalingam, J. ; Reguly, I.Z. ; Bertolli, C. ; Kelly, P.H.J. ; Trefethen, A.E.</creator><creatorcontrib>Mudalige, G.R. ; Giles, M.B. ; Thiyagalingam, J. ; Reguly, I.Z. ; Bertolli, C. ; Kelly, P.H.J. ; Trefethen, A.E.</creatorcontrib><description>•Discuss the main design issues in parallelizing unstructured mesh applications.•Present OP2 for developing applications for heterogeneous parallel systems.•Analyze the performance gained with OP2 for two industrial-representative benchmarks.•Compare runtime, scaling and runtime break-downs of the applications.•Present energy consumption of OP2 applications on CPU and GPU clusters. OP2 is a high-level domain specific library framework for the solution of unstructured mesh-based applications. It utilizes source-to-source translation and compilation so that a single application code written using the OP2 API can be transformed into multiple parallel implementations for execution on a range of back-end hardware platforms. In this paper we present the design and performance of OP2’s recent developments facilitating code generation and execution on distributed memory heterogeneous systems. OP2 targets the solution of numerical problems based on static unstructured meshes. We discuss the main design issues in parallelizing this class of applications. These include handling data dependencies in accessing indirectly referenced data and design considerations in generating code for execution on a cluster of multi-threaded CPUs and GPUs. Two representative CFD applications, written using the OP2 framework, are utilized to provide a contrasting benchmarking and performance analysis study on a number of heterogeneous systems including a large scale Cray XE6 system and a large GPU cluster. A range of performance metrics are benchmarked including runtime, scalability, achieved compute and bandwidth performance, runtime bottlenecks and systems energy consumption. We demonstrate that an application written once at a high-level using OP2 is easily portable across a wide range of contrasting platforms and is capable of achieving near-optimal performance without the intervention of the domain application programmer.</description><identifier>ISSN: 0167-8191</identifier><identifier>EISSN: 1872-7336</identifier><identifier>DOI: 10.1016/j.parco.2013.09.004</identifier><language>eng</language><publisher>Elsevier B.V</publisher><subject>Active library ; Benchmarking ; Clusters ; Computation ; Design engineering ; Domain specific language ; GPU ; Heterogeneous systems ; Mathematical models ; OP2 ; Parallel processing ; Platforms ; Run time (computers) ; Unstructured mesh</subject><ispartof>Parallel computing, 2013-11, Vol.39 (11), p.669-692</ispartof><rights>2013 Elsevier B.V.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c369t-255e1c9b5c92edf97be3eb1a8e026dcf382854ae8fcb214f7102016e0d230eb53</citedby><cites>FETCH-LOGICAL-c369t-255e1c9b5c92edf97be3eb1a8e026dcf382854ae8fcb214f7102016e0d230eb53</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27903,27904</link.rule.ids></links><search><creatorcontrib>Mudalige, G.R.</creatorcontrib><creatorcontrib>Giles, M.B.</creatorcontrib><creatorcontrib>Thiyagalingam, J.</creatorcontrib><creatorcontrib>Reguly, I.Z.</creatorcontrib><creatorcontrib>Bertolli, C.</creatorcontrib><creatorcontrib>Kelly, P.H.J.</creatorcontrib><creatorcontrib>Trefethen, A.E.</creatorcontrib><title>Design and initial performance of a high-level unstructured mesh framework on heterogeneous parallel systems</title><title>Parallel computing</title><description>•Discuss the main design issues in parallelizing unstructured mesh applications.•Present OP2 for developing applications for heterogeneous parallel systems.•Analyze the performance gained with OP2 for two industrial-representative benchmarks.•Compare runtime, scaling and runtime break-downs of the applications.•Present energy consumption of OP2 applications on CPU and GPU clusters. OP2 is a high-level domain specific library framework for the solution of unstructured mesh-based applications. It utilizes source-to-source translation and compilation so that a single application code written using the OP2 API can be transformed into multiple parallel implementations for execution on a range of back-end hardware platforms. In this paper we present the design and performance of OP2’s recent developments facilitating code generation and execution on distributed memory heterogeneous systems. OP2 targets the solution of numerical problems based on static unstructured meshes. We discuss the main design issues in parallelizing this class of applications. These include handling data dependencies in accessing indirectly referenced data and design considerations in generating code for execution on a cluster of multi-threaded CPUs and GPUs. Two representative CFD applications, written using the OP2 framework, are utilized to provide a contrasting benchmarking and performance analysis study on a number of heterogeneous systems including a large scale Cray XE6 system and a large GPU cluster. A range of performance metrics are benchmarked including runtime, scalability, achieved compute and bandwidth performance, runtime bottlenecks and systems energy consumption. We demonstrate that an application written once at a high-level using OP2 is easily portable across a wide range of contrasting platforms and is capable of achieving near-optimal performance without the intervention of the domain application programmer.</description><subject>Active library</subject><subject>Benchmarking</subject><subject>Clusters</subject><subject>Computation</subject><subject>Design engineering</subject><subject>Domain specific language</subject><subject>GPU</subject><subject>Heterogeneous systems</subject><subject>Mathematical models</subject><subject>OP2</subject><subject>Parallel processing</subject><subject>Platforms</subject><subject>Run time (computers)</subject><subject>Unstructured mesh</subject><issn>0167-8191</issn><issn>1872-7336</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2013</creationdate><recordtype>article</recordtype><recordid>eNqFkbtOwzAUhi0EEqXwBCweWRKO41wHBlSuEhILzJbjHLcuTlzsBNS3x22ZYTrL9x_9F0IuGaQMWHm9TjfSK5dmwHgKTQqQH5EZq6ssqTgvj8ksUlVSs4adkrMQ1gBQ5jXMiL3DYJYDlUNHzWBGIy3doNfO93JQSJ2mkq7McpVY_EJLpyGMflLj5LGjPYYV1V72-O38B3UDXeGI3i1xQDcFGk1Ja6MqbMOIfTgnJ1ragBe_d07eH-7fFk_Jy-vj8-L2JVG8bMYkKwpkqmkL1WTY6aZqkWPLZI2QlZ3SvM7qIpdYa9VmLNcVgxi8ROgyDtgWfE6uDn833n1OGEbRm6DQWrn3JWIXNQCrWP4_WsSiIM8biCg_oMq7EDxqsfGml34rGIjdDGIt9jOI3QwCGhFniKqbgwpj4C-DXgRlMFbbGY9qFJ0zf-p_AD-Sk8w</recordid><startdate>20131101</startdate><enddate>20131101</enddate><creator>Mudalige, G.R.</creator><creator>Giles, M.B.</creator><creator>Thiyagalingam, J.</creator><creator>Reguly, I.Z.</creator><creator>Bertolli, C.</creator><creator>Kelly, P.H.J.</creator><creator>Trefethen, A.E.</creator><general>Elsevier B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20131101</creationdate><title>Design and initial performance of a high-level unstructured mesh framework on heterogeneous parallel systems</title><author>Mudalige, G.R. ; Giles, M.B. ; Thiyagalingam, J. ; Reguly, I.Z. ; Bertolli, C. ; Kelly, P.H.J. ; Trefethen, A.E.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c369t-255e1c9b5c92edf97be3eb1a8e026dcf382854ae8fcb214f7102016e0d230eb53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Active library</topic><topic>Benchmarking</topic><topic>Clusters</topic><topic>Computation</topic><topic>Design engineering</topic><topic>Domain specific language</topic><topic>GPU</topic><topic>Heterogeneous systems</topic><topic>Mathematical models</topic><topic>OP2</topic><topic>Parallel processing</topic><topic>Platforms</topic><topic>Run time (computers)</topic><topic>Unstructured mesh</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Mudalige, G.R.</creatorcontrib><creatorcontrib>Giles, M.B.</creatorcontrib><creatorcontrib>Thiyagalingam, J.</creatorcontrib><creatorcontrib>Reguly, I.Z.</creatorcontrib><creatorcontrib>Bertolli, C.</creatorcontrib><creatorcontrib>Kelly, P.H.J.</creatorcontrib><creatorcontrib>Trefethen, A.E.</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Parallel computing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Mudalige, G.R.</au><au>Giles, M.B.</au><au>Thiyagalingam, J.</au><au>Reguly, I.Z.</au><au>Bertolli, C.</au><au>Kelly, P.H.J.</au><au>Trefethen, A.E.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Design and initial performance of a high-level unstructured mesh framework on heterogeneous parallel systems</atitle><jtitle>Parallel computing</jtitle><date>2013-11-01</date><risdate>2013</risdate><volume>39</volume><issue>11</issue><spage>669</spage><epage>692</epage><pages>669-692</pages><issn>0167-8191</issn><eissn>1872-7336</eissn><abstract>•Discuss the main design issues in parallelizing unstructured mesh applications.•Present OP2 for developing applications for heterogeneous parallel systems.•Analyze the performance gained with OP2 for two industrial-representative benchmarks.•Compare runtime, scaling and runtime break-downs of the applications.•Present energy consumption of OP2 applications on CPU and GPU clusters. OP2 is a high-level domain specific library framework for the solution of unstructured mesh-based applications. It utilizes source-to-source translation and compilation so that a single application code written using the OP2 API can be transformed into multiple parallel implementations for execution on a range of back-end hardware platforms. In this paper we present the design and performance of OP2’s recent developments facilitating code generation and execution on distributed memory heterogeneous systems. OP2 targets the solution of numerical problems based on static unstructured meshes. We discuss the main design issues in parallelizing this class of applications. These include handling data dependencies in accessing indirectly referenced data and design considerations in generating code for execution on a cluster of multi-threaded CPUs and GPUs. Two representative CFD applications, written using the OP2 framework, are utilized to provide a contrasting benchmarking and performance analysis study on a number of heterogeneous systems including a large scale Cray XE6 system and a large GPU cluster. A range of performance metrics are benchmarked including runtime, scalability, achieved compute and bandwidth performance, runtime bottlenecks and systems energy consumption. We demonstrate that an application written once at a high-level using OP2 is easily portable across a wide range of contrasting platforms and is capable of achieving near-optimal performance without the intervention of the domain application programmer.</abstract><pub>Elsevier B.V</pub><doi>10.1016/j.parco.2013.09.004</doi><tpages>24</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0167-8191
ispartof Parallel computing, 2013-11, Vol.39 (11), p.669-692
issn 0167-8191
1872-7336
language eng
recordid cdi_proquest_miscellaneous_1678001714
source ScienceDirect Journals
subjects Active library
Benchmarking
Clusters
Computation
Design engineering
Domain specific language
GPU
Heterogeneous systems
Mathematical models
OP2
Parallel processing
Platforms
Run time (computers)
Unstructured mesh
title Design and initial performance of a high-level unstructured mesh framework on heterogeneous parallel systems
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-24T07%3A01%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Design%20and%20initial%20performance%20of%20a%20high-level%20unstructured%20mesh%20framework%20on%20heterogeneous%20parallel%20systems&rft.jtitle=Parallel%20computing&rft.au=Mudalige,%20G.R.&rft.date=2013-11-01&rft.volume=39&rft.issue=11&rft.spage=669&rft.epage=692&rft.pages=669-692&rft.issn=0167-8191&rft.eissn=1872-7336&rft_id=info:doi/10.1016/j.parco.2013.09.004&rft_dat=%3Cproquest_cross%3E1506404490%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c369t-255e1c9b5c92edf97be3eb1a8e026dcf382854ae8fcb214f7102016e0d230eb53%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1506404490&rft_id=info:pmid/&rfr_iscdi=true