Loading…
The DiamondCandy LRnLA algorithm: raising efficiency of the 3D cross-stencil schemes
The parallel efficiency is raised by increasing the locality of calculation. With the locally recursive non-locally asynchronous algorithms method, we have constructed a new algorithm that improves the locality of the cross-stencil scheme implementation by the decomposition of the 3D computational d...
Saved in:
Published in: | The Journal of supercomputing 2019-12, Vol.75 (12), p.7778-7789 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The parallel efficiency is raised by increasing the locality of calculation. With the locally recursive non-locally asynchronous algorithms method, we have constructed a new algorithm that improves the locality of the cross-stencil scheme implementation by the decomposition of the 3D computational domain in time and space. The decomposition is based on a tiling of the 3D1T space into hexahedrons that closely fit the octahedron shape. This shape leads to an algorithm that is less intuitive than the rectangular domain decomposition, but since it follows the natural shape of the dependency region of the cross stencil, it has advantages in data localization and parallelization possibilities. We show its construction, analysis, and implementation possibilities. We present the benchmark results and show that the algorithm follows quantitative estimations: The performance exceeds the memory-bound limit of the stepwise implementation and does not degrade when the whole domain data do not fit higher cache levels. |
---|---|
ISSN: | 0920-8542 1573-0484 |
DOI: | 10.1007/s11227-018-2461-z |