Loading…
Accelerating LASG/IAP climate system ocean model version 3 for performance portability using Kokkos
In this paper, the performance portability of the LASG/IAP Climate System Ocean Model version 3 (LICOM3) is demonstrated based on the C++ library Kokkos. Kokkos enables application execution in various High-Performance Computing (HPC) architectures for on-node parallelism. This study employs Kokkos...
Saved in:
Published in: | Future generation computer systems 2024-11, Vol.160, p.901-917 |
---|---|
Main Authors: | , , , , , , , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In this paper, the performance portability of the LASG/IAP Climate System Ocean Model version 3 (LICOM3) is demonstrated based on the C++ library Kokkos. Kokkos enables application execution in various High-Performance Computing (HPC) architectures for on-node parallelism. This study employs Kokkos to expose on-node parallelism and reuses pre-existing Message-Passing Interface (MPI) for internode parallelism. By porting to Kokkos, the single-source code LICOM3 is successfully executed on ARM CPUs, Tesla V100, and HIP-based GPUs. To this end, the characteristics and mechanisms of LICOM3 and Kokkos are considered, and the model is then optimized comprehensively in terms of data management, computation, and memory transmission. The proposed Kokkos optimization code at a 1∘ resolution accelerates operation by factors of 1.9, 1.2, and 1.1 compared to the raw Compute Unified Device Architecture (CUDA), Heterogeneous Interface for Portable (HIP) and OpenMP codes, respectively. Further, it exhibits 3.4 Simulated Years Per Day (SYPD) at a resolution of 0.05∘ when executed on 4096 HIP-based GPUs for large-scale simulations.
•Adapting a global ocean model to new performance portability technologies.•The performance-portability code is compared to different raw back-end codes.•The proposed code achieving 3.4 Simulated Years Per Day at a resolution of 0.05∘. |
---|---|
ISSN: | 0167-739X |
DOI: | 10.1016/j.future.2024.06.029 |