Loading…

Revisiting network congestion avoidance through adaptive packet-chaining reservation

Endpoint congestion is a bottleneck in high-performance computing (HPC) networks, which severely impacts system performance, especially for latency-sensitive applications. When the long messages (or flows) has a far larger duration than the round-trip time (RTT), the proactive or reactive countermea...

Full description

Saved in:
Bibliographic Details
Published in:Computer networks (Amsterdam, Netherlands : 1999) Netherlands : 1999), 2022-07, Vol.212, p.109008, Article 109008
Main Authors: Wu, Ke, Dong, Dezun, Li, Cunlu, Xu, Weixia
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Endpoint congestion is a bottleneck in high-performance computing (HPC) networks, which severely impacts system performance, especially for latency-sensitive applications. When the long messages (or flows) has a far larger duration than the round-trip time (RTT), the proactive or reactive countermeasures, an effective solution to endpoint solution, can control the injection rate within a proper range dynamically. However, many HPC applications produce hybrid traffic (a mix of short and long messages) and are dominated by short messages. Existing proactive congestion avoidance methods face the great challenge of scheduling the rapidly changing traffic caused by these short messages. In this paper, we first propose the Packet-Chaining Reservation Protocol (PCRP), that is, a novel congestion management strategy which leverages the advantages of proactive (scheduling the whole flow) and reactive (scheduling the single packet) congestion avoidance techniques. In fact, it is an eclectic method of scheduling. We select the chaining of packets as a flexible reservation granularity between the whole flow and a single packet. The PCRP allows small flows to be speculatively transmitted without being discarded. It also gives the small flows an appropriate priority based on the detected traffic conditions. The PCRP can make a quick respond to network conditions, effectively avoiding endpoint congestion and reducing the average flow delay. However, PCRP is only suitable for short-flow-dominant traffic, as it performs poorly when facing other traffic. This is because PCRP can starve longer flows and thus introduces the unbearable tail delay. Therefore, we further propose the Packet-Chaining Reservation Protocol with Adaptive Framework (PCRP+), which is a reservation protocol that can flexibly adjust the scheduling strategy according to the network load. The PCRP+ can achieve and maintain low tail delay and desirable universality by adopting the adaptive framework. We conduct extensive experiments to evaluate the PCRP+ and compare it with the Speculative Reservation Protocol (SRP) and Bilateral Flow Reservation Protocol (BFRP), the two most typical proactive reservation-based protocols. Evaluation results demonstrate that our design can reduce the flow latency by an average of 25.71%, 21.21%, and 29.01% for hotspot traffic, uniform traffic, and GPCNeT traffic, respectively. •PCRP selects the packet chaining as the efficient granularity of the reservation.•Speculative pack
ISSN:1389-1286
1872-7069
DOI:10.1016/j.comnet.2022.109008