Loading…
A unified object-oriented framework for CPU+GPU explicit hyperbolic solvers
•Development of CPU+GPU hyperbolic solvers is unified under a simple object-oriented framework.•Benchmarks show how data-structure layouts have significant impacts in code scalability.•Coherent memory-space ordering and thread-interleaving techniques are used to improve constant and sparse workload...
Saved in:
Published in: | Advances in engineering software (1992) 2020-10, Vol.148, p.102802, Article 102802 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | •Development of CPU+GPU hyperbolic solvers is unified under a simple object-oriented framework.•Benchmarks show how data-structure layouts have significant impacts in code scalability.•Coherent memory-space ordering and thread-interleaving techniques are used to improve constant and sparse workload performance.•CPU parallel performance reveals supra-linear speedups on hyper-threading enabled processors.•Speedup on GPUs is around 40x relatively to sequential CPU performance for the tested setup.
A unified design solution for heterogeneous explicit hyperbolic solvers is herein introduced. The proposed design is entirely cross-compatible between CPUs and GPUs, through an intuitive object-oriented approach. The advantages of a unified CPU+GPU development approach are discussed and exemplified, and a complete description of the data and code structures are provided and benchmarked. The benefits of different object-oriented designs are quantified under static and dynamic loads in terms of parallel performance and scalability. A fair comparison with graphics processors provides a realistic measure of achievable GPU implementation benefits. Both automatically and manually tuned GPU executions are compared and shown to also have a significant impact on the obtained performance. Overall, the proposed design combines a good sequential performance with a supra-linear scalability on modern CPUs. On GPUs, execution is shown to be up to 40 times faster than its single-threaded counterpart, opening a wider range of applicable model scales and resolutions. |
---|---|
ISSN: | 0965-9978 |
DOI: | 10.1016/j.advengsoft.2020.102802 |