Loading…
An efficient GPU implementation of cyclic reduction solver for high-order compressible viscous flow simulations
•A global-memory-based Cyclic Reduction (CR) algorithm is implemented on GPU.•The proposed sort algorithm for memory transactions is well fitted to GPU.•CR solver is applied to 2D & 3D compressible viscous flows with compact scheme.•The GPU solver provides speedups up to 15.2× in 2D and 20.3× in...
Saved in:
Published in: | Computers & fluids 2014-03, Vol.92, p.160-171 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | •A global-memory-based Cyclic Reduction (CR) algorithm is implemented on GPU.•The proposed sort algorithm for memory transactions is well fitted to GPU.•CR solver is applied to 2D & 3D compressible viscous flows with compact scheme.•The GPU solver provides speedups up to 15.2× in 2D and 20.3× in 3D simulations.
In this paper, the performance of the Cyclic Reduction (CR) algorithm for solving tridiagonal systems is improved with the aid of efficient global memory transactions on Graphics Processing Units (GPU). To achieve maximum memory throughput with a lower computational runtime, two different Sort algorithms are introduced for reordering the initial system of equations: direct and step-by-step. It is shown that the latter method is well-fitted to modern GPUs and achieves speedup of up to 3.47× in single precision and 2.1× in double precision compared to the CPU Thomas algorithm. By benefiting from the new global memory implementation, the CR solver could run 2×–100× faster compared to previous works on parallel tridiagonal solvers. The CR solver is also applied to 2D & 3D compressible viscous flow simulations using the high-order compact finite-difference scheme. In this matter, the procedure of filtering, primitive variables, and flux derivative calculations are carried out by using the parallel tridiagonal solver on the GPU device. The GPU-accelerated calculations achieve speedups between 1.9×–15.2× in 2D and 6.4×–20.3× in 3D simulations for different grid sizes compared to CPU computations. The computations are performed on the NVIDIA GTX480 GPU. The obtained results are compared to those achieved on a single core of Intel Core 2 Duo (2.7GHz, 2MB cache) in terms of calculation runtime. |
---|---|
ISSN: | 0045-7930 1879-0747 |
DOI: | 10.1016/j.compfluid.2013.12.011 |