Loading…

Turbo: SmartNIC-enabled Dynamic Load Balancing of µs-scale RPCs

Online services are decomposed into fine-grained software components that communicate over the network using fine-grained Remote Procedure Calls (RPCs). Inter-server communication often exhibits patterns of wide RPC fan-outs between software tiers, raising the well-known tail at scale effect and nec...

Full description

Saved in:
Bibliographic Details
Main Authors: Seyedroudbari, Hamed, Vanavasam, Srikar, Daglis, Alexandros
Format: Conference Proceeding
Language:English
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Online services are decomposed into fine-grained software components that communicate over the network using fine-grained Remote Procedure Calls (RPCs). Inter-server communication often exhibits patterns of wide RPC fan-outs between software tiers, raising the well-known tail at scale effect and necessitating mechanisms that curb long response tail latencies. When handling µs-scale RPCs, request distribution across the cores of multicore servers is a major determinant of the resulting tail latency. Software approaches for inter-core RPC balancing introduce considerable overheads, throttling a server's peak throughput. On the other hand, existing NIC-based hardware mechanisms ameliorate software and inter-core synchronization overheads, but result in inter-core load imbalance that leaves significant performance improvement headroom.We introduce Turbo, a hardware on-NIC load-balancing mechanism that achieves near-optimal inter-core load distribution for the most fine-grained, light-tailed RPCs with service times of only a couple of µs. We implement Turbo on a programmable NIC and evaluate it on a range of different service time distributions and with a high-performance Key-Value store. Compared to hardware NIC-based mechanisms that statically spread load across cores, Turbo boosts throughput under a 99% latency Service Level Objective (SLO) of 30× the service time by up to 5×, and by up to 95× for a more aggressive 10× SLO target.
ISSN:2378-203X
DOI:10.1109/HPCA56546.2023.10071135