Loading…
An autotuning approach to select the inter-GPU communication library on heterogeneous systems
In this work, an automatic optimisation approach for parallel routines on multi-GPU systems is presented. Several inter-GPU communication libraries (such as CUDA-Aware MPI or NCCL) are used with a set of routines to perform the numerical operations among the GPUs located on the compute nodes. The ma...
Saved in:
Published in: | The Journal of supercomputing 2025, Vol.81 (1), Article 283 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In this work, an automatic optimisation approach for parallel routines on multi-GPU systems is presented. Several inter-GPU communication libraries (such as CUDA-Aware MPI or NCCL) are used with a set of routines to perform the numerical operations among the GPUs located on the compute nodes. The main objective is the selection of the most appropriate communication library, the number of GPUs to be used and the workload to be distributed among them in order to reduce the cost of data movements, which represent a large percentage of the total execution time. To this end, a hierarchical modelling of the execution time of each routine to be optimised is proposed, combining experimental and theoretical approaches. The results show that near-optimal decisions are taken in all the scenarios analysed. |
---|---|
ISSN: | 0920-8542 1573-0484 |
DOI: | 10.1007/s11227-024-06794-3 |