Loading…

An autotuning approach to select the inter-GPU communication library on heterogeneous systems

In this work, an automatic optimisation approach for parallel routines on multi-GPU systems is presented. Several inter-GPU communication libraries (such as CUDA-Aware MPI or NCCL) are used with a set of routines to perform the numerical operations among the GPUs located on the compute nodes. The ma...

Full description

Saved in:

Bibliographic Details
Published in:	The Journal of supercomputing 2025, Vol.81 (1), Article 283
Main Authors:	Cámara, Jesús, Cuenca, Javier, Galindo, Victor, Vicente, Arturo, Boratto, Murilo
Format:	Article
Language:	English
Subjects:	Compilers Computer Science Interpreters Optimization Processor Architectures Programming Languages Routines
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In this work, an automatic optimisation approach for parallel routines on multi-GPU systems is presented. Several inter-GPU communication libraries (such as CUDA-Aware MPI or NCCL) are used with a set of routines to perform the numerical operations among the GPUs located on the compute nodes. The main objective is the selection of the most appropriate communication library, the number of GPUs to be used and the workload to be distributed among them in order to reduce the cost of data movements, which represent a large percentage of the total execution time. To this end, a hierarchical modelling of the execution time of each routine to be optimised is proposed, combining experimental and theoretical approaches. The results show that near-optimal decisions are taken in all the scenarios analysed.
ISSN:	0920-8542 1573-0484
DOI:	10.1007/s11227-024-06794-3