Loading…

Design and initial performance of a high-level unstructured mesh framework on heterogeneous parallel systems

•Discuss the main design issues in parallelizing unstructured mesh applications.•Present OP2 for developing applications for heterogeneous parallel systems.•Analyze the performance gained with OP2 for two industrial-representative benchmarks.•Compare runtime, scaling and runtime break-downs of the a...

Full description

Saved in:
Bibliographic Details
Published in:Parallel computing 2013-11, Vol.39 (11), p.669-692
Main Authors: Mudalige, G.R., Giles, M.B., Thiyagalingam, J., Reguly, I.Z., Bertolli, C., Kelly, P.H.J., Trefethen, A.E.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•Discuss the main design issues in parallelizing unstructured mesh applications.•Present OP2 for developing applications for heterogeneous parallel systems.•Analyze the performance gained with OP2 for two industrial-representative benchmarks.•Compare runtime, scaling and runtime break-downs of the applications.•Present energy consumption of OP2 applications on CPU and GPU clusters. OP2 is a high-level domain specific library framework for the solution of unstructured mesh-based applications. It utilizes source-to-source translation and compilation so that a single application code written using the OP2 API can be transformed into multiple parallel implementations for execution on a range of back-end hardware platforms. In this paper we present the design and performance of OP2’s recent developments facilitating code generation and execution on distributed memory heterogeneous systems. OP2 targets the solution of numerical problems based on static unstructured meshes. We discuss the main design issues in parallelizing this class of applications. These include handling data dependencies in accessing indirectly referenced data and design considerations in generating code for execution on a cluster of multi-threaded CPUs and GPUs. Two representative CFD applications, written using the OP2 framework, are utilized to provide a contrasting benchmarking and performance analysis study on a number of heterogeneous systems including a large scale Cray XE6 system and a large GPU cluster. A range of performance metrics are benchmarked including runtime, scalability, achieved compute and bandwidth performance, runtime bottlenecks and systems energy consumption. We demonstrate that an application written once at a high-level using OP2 is easily portable across a wide range of contrasting platforms and is capable of achieving near-optimal performance without the intervention of the domain application programmer.
ISSN:0167-8191
1872-7336
DOI:10.1016/j.parco.2013.09.004