Loading…
OHIO: Enhancing RDMA Scalability in Alltoall with Optimized Communication Overlap
The presence of exascale computers has pushed a new boundary in computing capability, posing performance challenges in parallel programming models on how to exploit such systems efficiently. The Message Passing Interface (MPI) is a dominant model for parallel programming. Among its primitives, MPI_A...
Saved in:
Published in: | IEEE MICRO 2025-01, p.1-9 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The presence of exascale computers has pushed a new boundary in computing capability, posing performance challenges in parallel programming models on how to exploit such systems efficiently. The Message Passing Interface (MPI) is a dominant model for parallel programming. Among its primitives, MPI_Alltoall is a communication-intensive operation widely employed in numerous applications, yet it remains challenging to optimize. Alltoall algorithms are mainly classified into flat and hierarchical. The hierarchical designs avoid the slowdown of intra/inter-node communication by decoupling them. Hierarchical designs also reduce network congestion by limiting concurrently injected messages. This work demonstrates hierarchical designs also improve connection scalability in RDMA networks. This improvement is attributed to the cache thrashing happening inside network adapters. All of these advantages of hierarchical schemes collectively contribute to the network scalability of Alltoall. We propose and evaluate a network-agnostic design on InfiniBand and Omni-Path clusters, showing benefits at both micro-benchmark and application levels over other MPI libraries. |
---|---|
ISSN: | 0272-1732 1937-4143 |
DOI: | 10.1109/MM.2024.3524891 |