Loading…
Battle of the BlueFields: An In-Depth Comparison of the BlueField-2 and BlueField-3 SmartNICs
Over the past several years, Smart Network Interface Cards (NIC/SmartNICs) have rapidly evolved in popularity. In particular, NVIDIA's BlueField line of SmartNICs has been effective in a wide variety of uses: Offloading communication in High-Performance Computing applications (HPC), various sta...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Over the past several years, Smart Network Interface Cards (NIC/SmartNICs) have rapidly evolved in popularity. In particular, NVIDIA's BlueField line of SmartNICs has been effective in a wide variety of uses: Offloading communication in High-Performance Computing applications (HPC), various stages of the Deep Learning (DL) pipeline, and is designed especially for Datacenter/virtualization uses. The BlueField-3 DPU was released at the end of 2022 as a follow-up to its widely accepted BlueField-2 predecessor, and this work will serve as an in-depth performance evaluation between the two to show a) a comparison of both SmartNICs' on-chip capabilities (memory bandwidth, compute speed, etc.), and b) their offload capabilities through several micro/benchmarks and applications. In single-DPU programs, we see up to 61% improvements in the latency of a memcpy operation and up to 82% bandwidth improvement in the use of the STREAM benchmark [8] on the BlueField-3. With the use of a DPU-aware MPI library [1], we observe over 30% improvement at the micro-benchmark level when comparing staging-based designs on both SmartNICs and up to nearly double that in the context of an application with staging-based designs. However, GVMI (Guest Virtual Machine ID) based designs contained in said library do not exceed 10% at the benchmark level and provide less than 2% benefits in applications because of its architecture-insensitive nature - that is, while CPU clock speed may impact the completion time of instructions, the performance of the GVMI-based designs in a DPU-aware MPI library will largely be unaffected by swapping the BlueField-2 for a BlueField-3. |
---|---|
ISSN: | 2332-5569 |
DOI: | 10.1109/HOTI59126.2023.00020 |