Loading…

Accelerating LASG/IAP climate system ocean model version 3 for performance portability using Kokkos

In this paper, the performance portability of the LASG/IAP Climate System Ocean Model version 3 (LICOM3) is demonstrated based on the C++ library Kokkos. Kokkos enables application execution in various High-Performance Computing (HPC) architectures for on-node parallelism. This study employs Kokkos...

Full description

Saved in:
Bibliographic Details
Published in:Future generation computer systems 2024-11, Vol.160, p.901-917
Main Authors: Wei, Junlin, Lin, Pengfei, Jiang, Jinrong, Liu, Hailong, Zhao, Lian, Zhang, Yehong, Han, Xiang, Zhang, Feng, Huang, Jian, Wang, Yuzhu, Li, Youyun, Yu, Yue, Chi, Xuebin
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In this paper, the performance portability of the LASG/IAP Climate System Ocean Model version 3 (LICOM3) is demonstrated based on the C++ library Kokkos. Kokkos enables application execution in various High-Performance Computing (HPC) architectures for on-node parallelism. This study employs Kokkos to expose on-node parallelism and reuses pre-existing Message-Passing Interface (MPI) for internode parallelism. By porting to Kokkos, the single-source code LICOM3 is successfully executed on ARM CPUs, Tesla V100, and HIP-based GPUs. To this end, the characteristics and mechanisms of LICOM3 and Kokkos are considered, and the model is then optimized comprehensively in terms of data management, computation, and memory transmission. The proposed Kokkos optimization code at a 1∘ resolution accelerates operation by factors of 1.9, 1.2, and 1.1 compared to the raw Compute Unified Device Architecture (CUDA), Heterogeneous Interface for Portable (HIP) and OpenMP codes, respectively. Further, it exhibits 3.4 Simulated Years Per Day (SYPD) at a resolution of 0.05∘ when executed on 4096 HIP-based GPUs for large-scale simulations. •Adapting a global ocean model to new performance portability technologies.•The performance-portability code is compared to different raw back-end codes.•The proposed code achieving 3.4 Simulated Years Per Day at a resolution of 0.05∘.
ISSN:0167-739X
DOI:10.1016/j.future.2024.06.029