Loading…

CDNRocks: computable data nodes with RocksDB to improve the read performance of LSM-tree-based distributed key-value storage systems

Deploying LSM-tree-based key-value stores on distributed file systems is a common approach to building distributed key-value storage systems, such as RocksDB on HDFS (RoH). However, due to the inherent read amplification characteristics of the LSM-tree, RoH faces performance degradation issues in re...

Full description

Saved in:
Bibliographic Details
Published in:The Journal of supercomputing 2025, Vol.81 (1), Article 57
Main Authors: Huang, Feixiong, Pan, Yubiao, Zhang, Huizhen, Lin, Mingwei
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Deploying LSM-tree-based key-value stores on distributed file systems is a common approach to building distributed key-value storage systems, such as RocksDB on HDFS (RoH). However, due to the inherent read amplification characteristics of the LSM-tree, RoH faces performance degradation issues in read operations, including gets and scans, caused by the transmission of a large amount of irrelevant data between data nodes and the master node. To address these challenges, we firstly propose a computation offloading strategy that shifts get and scan operations from the master node to data nodes to reduce unnecessary data transmission. To boost the read operations, we then design concurrent execution methods: Concurrent-Get and Concurrent-Scan. Finally, we implement our prototype system, CDNRocks, based on RoH, and conduct extensive experiments in a distributed multi-node environment to demonstrate the effectiveness of CDNRocks. The results indicate that CDNRocks outperforms RoH by exhibiting better read performance, less data transmission, and lower CPU utilization on the master node. Furthermore, compared to other distributed key-value storage systems like Cassandra and HBase, CDNRocks excels in read performance within small cluster environments(with node counts up to 9), while achieving a balanced trade-off in load performance and efficient CPU and memory utilization.
ISSN:0920-8542
1573-0484
DOI:10.1007/s11227-024-06526-7