Loading…

The DiamondCandy LRnLA algorithm: raising efficiency of the 3D cross-stencil schemes

The parallel efficiency is raised by increasing the locality of calculation. With the locally recursive non-locally asynchronous algorithms method, we have constructed a new algorithm that improves the locality of the cross-stencil scheme implementation by the decomposition of the 3D computational d...

Full description

Saved in:
Bibliographic Details
Published in:The Journal of supercomputing 2019-12, Vol.75 (12), p.7778-7789
Main Authors: Perepelkina, Anastasia, Levchenko, Vadim, Khilkov, Sergey
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The parallel efficiency is raised by increasing the locality of calculation. With the locally recursive non-locally asynchronous algorithms method, we have constructed a new algorithm that improves the locality of the cross-stencil scheme implementation by the decomposition of the 3D computational domain in time and space. The decomposition is based on a tiling of the 3D1T space into hexahedrons that closely fit the octahedron shape. This shape leads to an algorithm that is less intuitive than the rectangular domain decomposition, but since it follows the natural shape of the dependency region of the cross stencil, it has advantages in data localization and parallelization possibilities. We show its construction, analysis, and implementation possibilities. We present the benchmark results and show that the algorithm follows quantitative estimations: The performance exceeds the memory-bound limit of the stepwise implementation and does not degrade when the whole domain data do not fit higher cache levels.
ISSN:0920-8542
1573-0484
DOI:10.1007/s11227-018-2461-z