Loading…
CDNRocks: computable data nodes with RocksDB to improve the read performance of LSM-tree-based distributed key-value storage systems
Deploying LSM-tree-based key-value stores on distributed file systems is a common approach to building distributed key-value storage systems, such as RocksDB on HDFS (RoH). However, due to the inherent read amplification characteristics of the LSM-tree, RoH faces performance degradation issues in re...
Saved in:
Published in: | The Journal of supercomputing 2025, Vol.81 (1), Article 57 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | cdi_FETCH-LOGICAL-c200t-cea6565152cd7ac0e5498af80dc62491dfbece82fff9753d36a85eea9848590f3 |
container_end_page | |
container_issue | 1 |
container_start_page | |
container_title | The Journal of supercomputing |
container_volume | 81 |
creator | Huang, Feixiong Pan, Yubiao Zhang, Huizhen Lin, Mingwei |
description | Deploying LSM-tree-based key-value stores on distributed file systems is a common approach to building distributed key-value storage systems, such as RocksDB on HDFS (RoH). However, due to the inherent read amplification characteristics of the LSM-tree, RoH faces performance degradation issues in read operations, including gets and scans, caused by the transmission of a large amount of irrelevant data between data nodes and the master node. To address these challenges, we firstly propose a computation offloading strategy that shifts get and scan operations from the master node to data nodes to reduce unnecessary data transmission. To boost the read operations, we then design concurrent execution methods: Concurrent-Get and Concurrent-Scan. Finally, we implement our prototype system, CDNRocks, based on RoH, and conduct extensive experiments in a distributed multi-node environment to demonstrate the effectiveness of CDNRocks. The results indicate that CDNRocks outperforms RoH by exhibiting better read performance, less data transmission, and lower CPU utilization on the master node. Furthermore, compared to other distributed key-value storage systems like Cassandra and HBase, CDNRocks excels in read performance within small cluster environments(with node counts up to 9), while achieving a balanced trade-off in load performance and efficient CPU and memory utilization. |
doi_str_mv | 10.1007/s11227-024-06526-7 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3119653430</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3119653430</sourcerecordid><originalsourceid>FETCH-LOGICAL-c200t-cea6565152cd7ac0e5498af80dc62491dfbece82fff9753d36a85eea9848590f3</originalsourceid><addsrcrecordid>eNp9kMtOIzEQRS0EEoHhB1hZYm2m_OoHOwgwIIVBmoG15dhlaEjHwXYzyp4PpyFI7GZVtTj3luoQcsjhmAPUPzPnQtQMhGJQaVGxeotMuK4lA9WobTKBVgBrtBK7ZC_nJwBQspYT8jY9__0nuud8Ql3sV0Ox8wVSb4uly-gx039deaSfxPkZLZF2_SrFV6TlEWlC6-kKU4ipt0uHNAY6-3vDSkJkc5vRU9_lkrr5UMb9Gdfs1S4GpLnEZB_Guc4F-_yD7AS7yHjwNffJ_eXF3fSKzW5_XU9PZ8wJgMIc2kpXmmvhfG0doFZtY0MD3lVCtdyHOTpsRAihrbX0srKNRrRtoxrdQpD75GjTO77wMmAu5ikOaTmeNJLzttJSSRgpsaFcijknDGaVut6mteFgPmybjW0z2jaftk09huQmlEd4-YDpu_o_qXcI8IRW</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3119653430</pqid></control><display><type>article</type><title>CDNRocks: computable data nodes with RocksDB to improve the read performance of LSM-tree-based distributed key-value storage systems</title><source>Springer Nature</source><creator>Huang, Feixiong ; Pan, Yubiao ; Zhang, Huizhen ; Lin, Mingwei</creator><creatorcontrib>Huang, Feixiong ; Pan, Yubiao ; Zhang, Huizhen ; Lin, Mingwei</creatorcontrib><description>Deploying LSM-tree-based key-value stores on distributed file systems is a common approach to building distributed key-value storage systems, such as RocksDB on HDFS (RoH). However, due to the inherent read amplification characteristics of the LSM-tree, RoH faces performance degradation issues in read operations, including gets and scans, caused by the transmission of a large amount of irrelevant data between data nodes and the master node. To address these challenges, we firstly propose a computation offloading strategy that shifts get and scan operations from the master node to data nodes to reduce unnecessary data transmission. To boost the read operations, we then design concurrent execution methods: Concurrent-Get and Concurrent-Scan. Finally, we implement our prototype system, CDNRocks, based on RoH, and conduct extensive experiments in a distributed multi-node environment to demonstrate the effectiveness of CDNRocks. The results indicate that CDNRocks outperforms RoH by exhibiting better read performance, less data transmission, and lower CPU utilization on the master node. Furthermore, compared to other distributed key-value storage systems like Cassandra and HBase, CDNRocks excels in read performance within small cluster environments(with node counts up to 9), while achieving a balanced trade-off in load performance and efficient CPU and memory utilization.</description><identifier>ISSN: 0920-8542</identifier><identifier>EISSN: 1573-0484</identifier><identifier>DOI: 10.1007/s11227-024-06526-7</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Compilers ; Computation offloading ; Computer Science ; Data storage ; Data transmission ; Interpreters ; Nodes ; Performance degradation ; Processor Architectures ; Programming Languages ; Storage systems</subject><ispartof>The Journal of supercomputing, 2025, Vol.81 (1), Article 57</ispartof><rights>The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c200t-cea6565152cd7ac0e5498af80dc62491dfbece82fff9753d36a85eea9848590f3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Huang, Feixiong</creatorcontrib><creatorcontrib>Pan, Yubiao</creatorcontrib><creatorcontrib>Zhang, Huizhen</creatorcontrib><creatorcontrib>Lin, Mingwei</creatorcontrib><title>CDNRocks: computable data nodes with RocksDB to improve the read performance of LSM-tree-based distributed key-value storage systems</title><title>The Journal of supercomputing</title><addtitle>J Supercomput</addtitle><description>Deploying LSM-tree-based key-value stores on distributed file systems is a common approach to building distributed key-value storage systems, such as RocksDB on HDFS (RoH). However, due to the inherent read amplification characteristics of the LSM-tree, RoH faces performance degradation issues in read operations, including gets and scans, caused by the transmission of a large amount of irrelevant data between data nodes and the master node. To address these challenges, we firstly propose a computation offloading strategy that shifts get and scan operations from the master node to data nodes to reduce unnecessary data transmission. To boost the read operations, we then design concurrent execution methods: Concurrent-Get and Concurrent-Scan. Finally, we implement our prototype system, CDNRocks, based on RoH, and conduct extensive experiments in a distributed multi-node environment to demonstrate the effectiveness of CDNRocks. The results indicate that CDNRocks outperforms RoH by exhibiting better read performance, less data transmission, and lower CPU utilization on the master node. Furthermore, compared to other distributed key-value storage systems like Cassandra and HBase, CDNRocks excels in read performance within small cluster environments(with node counts up to 9), while achieving a balanced trade-off in load performance and efficient CPU and memory utilization.</description><subject>Compilers</subject><subject>Computation offloading</subject><subject>Computer Science</subject><subject>Data storage</subject><subject>Data transmission</subject><subject>Interpreters</subject><subject>Nodes</subject><subject>Performance degradation</subject><subject>Processor Architectures</subject><subject>Programming Languages</subject><subject>Storage systems</subject><issn>0920-8542</issn><issn>1573-0484</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2025</creationdate><recordtype>article</recordtype><recordid>eNp9kMtOIzEQRS0EEoHhB1hZYm2m_OoHOwgwIIVBmoG15dhlaEjHwXYzyp4PpyFI7GZVtTj3luoQcsjhmAPUPzPnQtQMhGJQaVGxeotMuK4lA9WobTKBVgBrtBK7ZC_nJwBQspYT8jY9__0nuud8Ql3sV0Ox8wVSb4uly-gx039deaSfxPkZLZF2_SrFV6TlEWlC6-kKU4ipt0uHNAY6-3vDSkJkc5vRU9_lkrr5UMb9Gdfs1S4GpLnEZB_Guc4F-_yD7AS7yHjwNffJ_eXF3fSKzW5_XU9PZ8wJgMIc2kpXmmvhfG0doFZtY0MD3lVCtdyHOTpsRAihrbX0srKNRrRtoxrdQpD75GjTO77wMmAu5ikOaTmeNJLzttJSSRgpsaFcijknDGaVut6mteFgPmybjW0z2jaftk09huQmlEd4-YDpu_o_qXcI8IRW</recordid><startdate>2025</startdate><enddate>2025</enddate><creator>Huang, Feixiong</creator><creator>Pan, Yubiao</creator><creator>Zhang, Huizhen</creator><creator>Lin, Mingwei</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>2025</creationdate><title>CDNRocks: computable data nodes with RocksDB to improve the read performance of LSM-tree-based distributed key-value storage systems</title><author>Huang, Feixiong ; Pan, Yubiao ; Zhang, Huizhen ; Lin, Mingwei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c200t-cea6565152cd7ac0e5498af80dc62491dfbece82fff9753d36a85eea9848590f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2025</creationdate><topic>Compilers</topic><topic>Computation offloading</topic><topic>Computer Science</topic><topic>Data storage</topic><topic>Data transmission</topic><topic>Interpreters</topic><topic>Nodes</topic><topic>Performance degradation</topic><topic>Processor Architectures</topic><topic>Programming Languages</topic><topic>Storage systems</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Huang, Feixiong</creatorcontrib><creatorcontrib>Pan, Yubiao</creatorcontrib><creatorcontrib>Zhang, Huizhen</creatorcontrib><creatorcontrib>Lin, Mingwei</creatorcontrib><collection>CrossRef</collection><jtitle>The Journal of supercomputing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Huang, Feixiong</au><au>Pan, Yubiao</au><au>Zhang, Huizhen</au><au>Lin, Mingwei</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>CDNRocks: computable data nodes with RocksDB to improve the read performance of LSM-tree-based distributed key-value storage systems</atitle><jtitle>The Journal of supercomputing</jtitle><stitle>J Supercomput</stitle><date>2025</date><risdate>2025</risdate><volume>81</volume><issue>1</issue><artnum>57</artnum><issn>0920-8542</issn><eissn>1573-0484</eissn><abstract>Deploying LSM-tree-based key-value stores on distributed file systems is a common approach to building distributed key-value storage systems, such as RocksDB on HDFS (RoH). However, due to the inherent read amplification characteristics of the LSM-tree, RoH faces performance degradation issues in read operations, including gets and scans, caused by the transmission of a large amount of irrelevant data between data nodes and the master node. To address these challenges, we firstly propose a computation offloading strategy that shifts get and scan operations from the master node to data nodes to reduce unnecessary data transmission. To boost the read operations, we then design concurrent execution methods: Concurrent-Get and Concurrent-Scan. Finally, we implement our prototype system, CDNRocks, based on RoH, and conduct extensive experiments in a distributed multi-node environment to demonstrate the effectiveness of CDNRocks. The results indicate that CDNRocks outperforms RoH by exhibiting better read performance, less data transmission, and lower CPU utilization on the master node. Furthermore, compared to other distributed key-value storage systems like Cassandra and HBase, CDNRocks excels in read performance within small cluster environments(with node counts up to 9), while achieving a balanced trade-off in load performance and efficient CPU and memory utilization.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s11227-024-06526-7</doi></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0920-8542 |
ispartof | The Journal of supercomputing, 2025, Vol.81 (1), Article 57 |
issn | 0920-8542 1573-0484 |
language | eng |
recordid | cdi_proquest_journals_3119653430 |
source | Springer Nature |
subjects | Compilers Computation offloading Computer Science Data storage Data transmission Interpreters Nodes Performance degradation Processor Architectures Programming Languages Storage systems |
title | CDNRocks: computable data nodes with RocksDB to improve the read performance of LSM-tree-based distributed key-value storage systems |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T04%3A19%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=CDNRocks:%20computable%20data%20nodes%20with%20RocksDB%20to%20improve%20the%20read%20performance%20of%20LSM-tree-based%20distributed%20key-value%20storage%20systems&rft.jtitle=The%20Journal%20of%20supercomputing&rft.au=Huang,%20Feixiong&rft.date=2025&rft.volume=81&rft.issue=1&rft.artnum=57&rft.issn=0920-8542&rft.eissn=1573-0484&rft_id=info:doi/10.1007/s11227-024-06526-7&rft_dat=%3Cproquest_cross%3E3119653430%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c200t-cea6565152cd7ac0e5498af80dc62491dfbece82fff9753d36a85eea9848590f3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3119653430&rft_id=info:pmid/&rfr_iscdi=true |