Loading…

CDNRocks: computable data nodes with RocksDB to improve the read performance of LSM-tree-based distributed key-value storage systems

Deploying LSM-tree-based key-value stores on distributed file systems is a common approach to building distributed key-value storage systems, such as RocksDB on HDFS (RoH). However, due to the inherent read amplification characteristics of the LSM-tree, RoH faces performance degradation issues in re...

Full description

Saved in:
Bibliographic Details
Published in:The Journal of supercomputing 2025, Vol.81 (1), Article 57
Main Authors: Huang, Feixiong, Pan, Yubiao, Zhang, Huizhen, Lin, Mingwei
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c200t-cea6565152cd7ac0e5498af80dc62491dfbece82fff9753d36a85eea9848590f3
container_end_page
container_issue 1
container_start_page
container_title The Journal of supercomputing
container_volume 81
creator Huang, Feixiong
Pan, Yubiao
Zhang, Huizhen
Lin, Mingwei
description Deploying LSM-tree-based key-value stores on distributed file systems is a common approach to building distributed key-value storage systems, such as RocksDB on HDFS (RoH). However, due to the inherent read amplification characteristics of the LSM-tree, RoH faces performance degradation issues in read operations, including gets and scans, caused by the transmission of a large amount of irrelevant data between data nodes and the master node. To address these challenges, we firstly propose a computation offloading strategy that shifts get and scan operations from the master node to data nodes to reduce unnecessary data transmission. To boost the read operations, we then design concurrent execution methods: Concurrent-Get and Concurrent-Scan. Finally, we implement our prototype system, CDNRocks, based on RoH, and conduct extensive experiments in a distributed multi-node environment to demonstrate the effectiveness of CDNRocks. The results indicate that CDNRocks outperforms RoH by exhibiting better read performance, less data transmission, and lower CPU utilization on the master node. Furthermore, compared to other distributed key-value storage systems like Cassandra and HBase, CDNRocks excels in read performance within small cluster environments(with node counts up to 9), while achieving a balanced trade-off in load performance and efficient CPU and memory utilization.
doi_str_mv 10.1007/s11227-024-06526-7
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3119653430</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3119653430</sourcerecordid><originalsourceid>FETCH-LOGICAL-c200t-cea6565152cd7ac0e5498af80dc62491dfbece82fff9753d36a85eea9848590f3</originalsourceid><addsrcrecordid>eNp9kMtOIzEQRS0EEoHhB1hZYm2m_OoHOwgwIIVBmoG15dhlaEjHwXYzyp4PpyFI7GZVtTj3luoQcsjhmAPUPzPnQtQMhGJQaVGxeotMuK4lA9WobTKBVgBrtBK7ZC_nJwBQspYT8jY9__0nuud8Ql3sV0Ox8wVSb4uly-gx039deaSfxPkZLZF2_SrFV6TlEWlC6-kKU4ipt0uHNAY6-3vDSkJkc5vRU9_lkrr5UMb9Gdfs1S4GpLnEZB_Guc4F-_yD7AS7yHjwNffJ_eXF3fSKzW5_XU9PZ8wJgMIc2kpXmmvhfG0doFZtY0MD3lVCtdyHOTpsRAihrbX0srKNRrRtoxrdQpD75GjTO77wMmAu5ikOaTmeNJLzttJSSRgpsaFcijknDGaVut6mteFgPmybjW0z2jaftk09huQmlEd4-YDpu_o_qXcI8IRW</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3119653430</pqid></control><display><type>article</type><title>CDNRocks: computable data nodes with RocksDB to improve the read performance of LSM-tree-based distributed key-value storage systems</title><source>Springer Nature</source><creator>Huang, Feixiong ; Pan, Yubiao ; Zhang, Huizhen ; Lin, Mingwei</creator><creatorcontrib>Huang, Feixiong ; Pan, Yubiao ; Zhang, Huizhen ; Lin, Mingwei</creatorcontrib><description>Deploying LSM-tree-based key-value stores on distributed file systems is a common approach to building distributed key-value storage systems, such as RocksDB on HDFS (RoH). However, due to the inherent read amplification characteristics of the LSM-tree, RoH faces performance degradation issues in read operations, including gets and scans, caused by the transmission of a large amount of irrelevant data between data nodes and the master node. To address these challenges, we firstly propose a computation offloading strategy that shifts get and scan operations from the master node to data nodes to reduce unnecessary data transmission. To boost the read operations, we then design concurrent execution methods: Concurrent-Get and Concurrent-Scan. Finally, we implement our prototype system, CDNRocks, based on RoH, and conduct extensive experiments in a distributed multi-node environment to demonstrate the effectiveness of CDNRocks. The results indicate that CDNRocks outperforms RoH by exhibiting better read performance, less data transmission, and lower CPU utilization on the master node. Furthermore, compared to other distributed key-value storage systems like Cassandra and HBase, CDNRocks excels in read performance within small cluster environments(with node counts up to 9), while achieving a balanced trade-off in load performance and efficient CPU and memory utilization.</description><identifier>ISSN: 0920-8542</identifier><identifier>EISSN: 1573-0484</identifier><identifier>DOI: 10.1007/s11227-024-06526-7</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Compilers ; Computation offloading ; Computer Science ; Data storage ; Data transmission ; Interpreters ; Nodes ; Performance degradation ; Processor Architectures ; Programming Languages ; Storage systems</subject><ispartof>The Journal of supercomputing, 2025, Vol.81 (1), Article 57</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c200t-cea6565152cd7ac0e5498af80dc62491dfbece82fff9753d36a85eea9848590f3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Huang, Feixiong</creatorcontrib><creatorcontrib>Pan, Yubiao</creatorcontrib><creatorcontrib>Zhang, Huizhen</creatorcontrib><creatorcontrib>Lin, Mingwei</creatorcontrib><title>CDNRocks: computable data nodes with RocksDB to improve the read performance of LSM-tree-based distributed key-value storage systems</title><title>The Journal of supercomputing</title><addtitle>J Supercomput</addtitle><description>Deploying LSM-tree-based key-value stores on distributed file systems is a common approach to building distributed key-value storage systems, such as RocksDB on HDFS (RoH). However, due to the inherent read amplification characteristics of the LSM-tree, RoH faces performance degradation issues in read operations, including gets and scans, caused by the transmission of a large amount of irrelevant data between data nodes and the master node. To address these challenges, we firstly propose a computation offloading strategy that shifts get and scan operations from the master node to data nodes to reduce unnecessary data transmission. To boost the read operations, we then design concurrent execution methods: Concurrent-Get and Concurrent-Scan. Finally, we implement our prototype system, CDNRocks, based on RoH, and conduct extensive experiments in a distributed multi-node environment to demonstrate the effectiveness of CDNRocks. The results indicate that CDNRocks outperforms RoH by exhibiting better read performance, less data transmission, and lower CPU utilization on the master node. Furthermore, compared to other distributed key-value storage systems like Cassandra and HBase, CDNRocks excels in read performance within small cluster environments(with node counts up to 9), while achieving a balanced trade-off in load performance and efficient CPU and memory utilization.</description><subject>Compilers</subject><subject>Computation offloading</subject><subject>Computer Science</subject><subject>Data storage</subject><subject>Data transmission</subject><subject>Interpreters</subject><subject>Nodes</subject><subject>Performance degradation</subject><subject>Processor Architectures</subject><subject>Programming Languages</subject><subject>Storage systems</subject><issn>0920-8542</issn><issn>1573-0484</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2025</creationdate><recordtype>article</recordtype><recordid>eNp9kMtOIzEQRS0EEoHhB1hZYm2m_OoHOwgwIIVBmoG15dhlaEjHwXYzyp4PpyFI7GZVtTj3luoQcsjhmAPUPzPnQtQMhGJQaVGxeotMuK4lA9WobTKBVgBrtBK7ZC_nJwBQspYT8jY9__0nuud8Ql3sV0Ox8wVSb4uly-gx039deaSfxPkZLZF2_SrFV6TlEWlC6-kKU4ipt0uHNAY6-3vDSkJkc5vRU9_lkrr5UMb9Gdfs1S4GpLnEZB_Guc4F-_yD7AS7yHjwNffJ_eXF3fSKzW5_XU9PZ8wJgMIc2kpXmmvhfG0doFZtY0MD3lVCtdyHOTpsRAihrbX0srKNRrRtoxrdQpD75GjTO77wMmAu5ikOaTmeNJLzttJSSRgpsaFcijknDGaVut6mteFgPmybjW0z2jaftk09huQmlEd4-YDpu_o_qXcI8IRW</recordid><startdate>2025</startdate><enddate>2025</enddate><creator>Huang, Feixiong</creator><creator>Pan, Yubiao</creator><creator>Zhang, Huizhen</creator><creator>Lin, Mingwei</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>2025</creationdate><title>CDNRocks: computable data nodes with RocksDB to improve the read performance of LSM-tree-based distributed key-value storage systems</title><author>Huang, Feixiong ; Pan, Yubiao ; Zhang, Huizhen ; Lin, Mingwei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c200t-cea6565152cd7ac0e5498af80dc62491dfbece82fff9753d36a85eea9848590f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2025</creationdate><topic>Compilers</topic><topic>Computation offloading</topic><topic>Computer Science</topic><topic>Data storage</topic><topic>Data transmission</topic><topic>Interpreters</topic><topic>Nodes</topic><topic>Performance degradation</topic><topic>Processor Architectures</topic><topic>Programming Languages</topic><topic>Storage systems</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Huang, Feixiong</creatorcontrib><creatorcontrib>Pan, Yubiao</creatorcontrib><creatorcontrib>Zhang, Huizhen</creatorcontrib><creatorcontrib>Lin, Mingwei</creatorcontrib><collection>CrossRef</collection><jtitle>The Journal of supercomputing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Huang, Feixiong</au><au>Pan, Yubiao</au><au>Zhang, Huizhen</au><au>Lin, Mingwei</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>CDNRocks: computable data nodes with RocksDB to improve the read performance of LSM-tree-based distributed key-value storage systems</atitle><jtitle>The Journal of supercomputing</jtitle><stitle>J Supercomput</stitle><date>2025</date><risdate>2025</risdate><volume>81</volume><issue>1</issue><artnum>57</artnum><issn>0920-8542</issn><eissn>1573-0484</eissn><abstract>Deploying LSM-tree-based key-value stores on distributed file systems is a common approach to building distributed key-value storage systems, such as RocksDB on HDFS (RoH). However, due to the inherent read amplification characteristics of the LSM-tree, RoH faces performance degradation issues in read operations, including gets and scans, caused by the transmission of a large amount of irrelevant data between data nodes and the master node. To address these challenges, we firstly propose a computation offloading strategy that shifts get and scan operations from the master node to data nodes to reduce unnecessary data transmission. To boost the read operations, we then design concurrent execution methods: Concurrent-Get and Concurrent-Scan. Finally, we implement our prototype system, CDNRocks, based on RoH, and conduct extensive experiments in a distributed multi-node environment to demonstrate the effectiveness of CDNRocks. The results indicate that CDNRocks outperforms RoH by exhibiting better read performance, less data transmission, and lower CPU utilization on the master node. Furthermore, compared to other distributed key-value storage systems like Cassandra and HBase, CDNRocks excels in read performance within small cluster environments(with node counts up to 9), while achieving a balanced trade-off in load performance and efficient CPU and memory utilization.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11227-024-06526-7</doi></addata></record>
fulltext fulltext
identifier ISSN: 0920-8542
ispartof The Journal of supercomputing, 2025, Vol.81 (1), Article 57
issn 0920-8542
1573-0484
language eng
recordid cdi_proquest_journals_3119653430
source Springer Nature
subjects Compilers
Computation offloading
Computer Science
Data storage
Data transmission
Interpreters
Nodes
Performance degradation
Processor Architectures
Programming Languages
Storage systems
title CDNRocks: computable data nodes with RocksDB to improve the read performance of LSM-tree-based distributed key-value storage systems
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T04%3A19%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=CDNRocks:%20computable%20data%20nodes%20with%20RocksDB%20to%20improve%20the%20read%20performance%20of%20LSM-tree-based%20distributed%20key-value%20storage%20systems&rft.jtitle=The%20Journal%20of%20supercomputing&rft.au=Huang,%20Feixiong&rft.date=2025&rft.volume=81&rft.issue=1&rft.artnum=57&rft.issn=0920-8542&rft.eissn=1573-0484&rft_id=info:doi/10.1007/s11227-024-06526-7&rft_dat=%3Cproquest_cross%3E3119653430%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c200t-cea6565152cd7ac0e5498af80dc62491dfbece82fff9753d36a85eea9848590f3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3119653430&rft_id=info:pmid/&rfr_iscdi=true