Loading…

LoWAR: Enhancing RDMA over Lossy WANs with Transparent Error Correction

As the increase of geographically distributed applications continues, the demand for high-speed, long-distance data transmission across wide area networks (WANs) has significantly increased. Remote Direct Memory Access (RDMA) is extensively deployed in data center networks (DCNs) for its high throug...

Full description

Saved in:
Bibliographic Details
Main Authors: Zuo, Tianyu, Sun, Tao, Zhu, Shuyong, Li, Wenxiao, Lu, Lu, Du, Zongpeng, Zhang, Yujun
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:As the increase of geographically distributed applications continues, the demand for high-speed, long-distance data transmission across wide area networks (WANs) has significantly increased. Remote Direct Memory Access (RDMA) is extensively deployed in data center networks (DCNs) for its high throughput, low latency, and reduced CPU utilization, and its extension to WANs is expected to fully leverage these benefits. However, existing RDMA solutions, while demonstrating superior performance in data centers, face a performance gap over WANs due to their reliance on DCNs for optimal performance and lack of optimization for WANs' high latency and loss rates. To bridge this gap, we introduce Lossy Wide-Area RDMA (LoWAR), a high-goodput, high-reliability RDMA solution for lossy WANs. LoWAR incorporates a forward error correction (FEC) shim layer to protect RDMA messages from packet loss, thus minimizing the inefficiency of retransmissions. It also fully offloads processing to RNICs with minimal computational overhead and storage burden, operating transparently on RNICs without requiring modifications to existing applications and networks. We implement a LoWAR prototype with FPGA and evaluate its performance through testbed experiments. The results demonstrate LoWAR's enhanced performance in lossy WANs: in WANs with 40ms RTT and 0.001% to 0.01% loss rates, LoWAR increases RDMA goodput by 2.05 to 5.01 times, reduces average flow completion times (FCTs) by 3.5% to 12.2%, and eliminates 99th percentile tail FCTs in most scenarios.
ISSN:2766-8568
DOI:10.1109/IWQoS61813.2024.10682853