Loading…

A unified object-oriented framework for CPU+GPU explicit hyperbolic solvers

•Development of CPU+GPU hyperbolic solvers is unified under a simple object-oriented framework.•Benchmarks show how data-structure layouts have significant impacts in code scalability.•Coherent memory-space ordering and thread-interleaving techniques are used to improve constant and sparse workload...

Full description

Saved in:
Bibliographic Details
Published in:Advances in engineering software (1992) 2020-10, Vol.148, p.102802, Article 102802
Main Authors: Conde, Daniel A.S., Canelas, Ricardo B., Ferreira, Rui M.L.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•Development of CPU+GPU hyperbolic solvers is unified under a simple object-oriented framework.•Benchmarks show how data-structure layouts have significant impacts in code scalability.•Coherent memory-space ordering and thread-interleaving techniques are used to improve constant and sparse workload performance.•CPU parallel performance reveals supra-linear speedups on hyper-threading enabled processors.•Speedup on GPUs is around 40x relatively to sequential CPU performance for the tested setup. A unified design solution for heterogeneous explicit hyperbolic solvers is herein introduced. The proposed design is entirely cross-compatible between CPUs and GPUs, through an intuitive object-oriented approach. The advantages of a unified CPU+GPU development approach are discussed and exemplified, and a complete description of the data and code structures are provided and benchmarked. The benefits of different object-oriented designs are quantified under static and dynamic loads in terms of parallel performance and scalability. A fair comparison with graphics processors provides a realistic measure of achievable GPU implementation benefits. Both automatically and manually tuned GPU executions are compared and shown to also have a significant impact on the obtained performance. Overall, the proposed design combines a good sequential performance with a supra-linear scalability on modern CPUs. On GPUs, execution is shown to be up to 40 times faster than its single-threaded counterpart, opening a wider range of applicable model scales and resolutions.
ISSN:0965-9978
DOI:10.1016/j.advengsoft.2020.102802