Loading…

FULL-KV: Flexible and Ultra-Low-Latency In-Memory Key-Value Store System Design on CPU-FPGA

In-memory key-value store (IMKVS) has gained great popularity in data centers. However, big data brings great challenges in performance and power consumption because of the general-purpose Von Neumann computer architecture. Remote direct memory access (RDMA) technology supporting zero-copy networkin...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on parallel and distributed systems 2020-08, Vol.31 (8), p.1828-1444
Main Authors: Qiu, Yunhui, Xie, Jinyu, Lv, Hankun, Yin, Wenbo, Luk, Wai-Shing, Wang, Lingli, Yu, Bowei, Chen, Hua, Ge, Xianjun, Liao, Zhijian, Shi, Xiaozhong
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In-memory key-value store (IMKVS) has gained great popularity in data centers. However, big data brings great challenges in performance and power consumption because of the general-purpose Von Neumann computer architecture. Remote direct memory access (RDMA) technology supporting zero-copy networking could partly alleviate the problem but is still not efficient for KVS. To overcome this problem, we present a flexible and ultra-low-latency IMKVS system named FULL-KV, based on a CPU-FPGA heterogeneous architecture. The FPGA serves as a KVS accelerator that can bypass the CPU and implement both the network stacks and the KVS processing with a highly parallel hardware architecture. The system latency of FULL-KV can achieve as low as 1.5μs/2.2μs for the PUT/GET operation, which is 3.0x/1.5x faster than current state-of-the-art hardware-based KVS systems. Besides, FULL-KV can support 4x larger values (up to 4M bytes). Given a total Ethernet bandwidth of 20Gbps, the peak throughput of the single-node FULL-KV can reach 26.0 million key-value operations per second (Mops). In the two-node test system with a commercial Ethernet switch, the peak throughput can reach 52Mops, manifesting the system scalability and practicability.
ISSN:1045-9219
1558-2183
DOI:10.1109/TPDS.2020.2973965