Loading…

GraphTune: An Efficient Dependency-aware Substrate to Alleviate Irregularity in Concurrent Graph Processing

With the increasing need for graph analysis, massive Concurrent iterative Graph Processing (CGP) jobs are usually performed on the common large-scale real-world graph. Although several solutions have been proposed, these CGP jobs are not coordinated with the consideration of the inherent dependencie...

Full description

Saved in:
Bibliographic Details
Published in:ACM transactions on architecture and code optimization 2023-09, Vol.20 (3), p.1-24
Main Authors: Zhao, Jin, Zhang, Yu, He, Ligang, Li, Qikun, Zhang, Xiang, Jiang, Xinyu, Yu, Hui, Liao, Xiaofei, Jin, Hai, Gu, Lin, Liu, Haikun, He, Bingsheng, Zhang, Ji, Song, Xianzheng, Wang, Lin, Zhou, Jun
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-a239t-8470d2147310144c546bbeb02c24cab400543f55416d3fe2a58a59678044ae2c3
container_end_page 24
container_issue 3
container_start_page 1
container_title ACM transactions on architecture and code optimization
container_volume 20
creator Zhao, Jin
Zhang, Yu
He, Ligang
Li, Qikun
Zhang, Xiang
Jiang, Xinyu
Yu, Hui
Liao, Xiaofei
Jin, Hai
Gu, Lin
Liu, Haikun
He, Bingsheng
Zhang, Ji
Song, Xianzheng
Wang, Lin
Zhou, Jun
description With the increasing need for graph analysis, massive Concurrent iterative Graph Processing (CGP) jobs are usually performed on the common large-scale real-world graph. Although several solutions have been proposed, these CGP jobs are not coordinated with the consideration of the inherent dependencies in graph data driven by graph topology. As a result, they suffer from redundant and fragmented accesses of the same underlying graph dispersed over distributed platform, because the same graph is typically irregularly traversed by these jobs along different paths at the same time. In this work, we develop GraphTune, which can be integrated into existing distributed graph processing systems, such as D-Galois, Gemini, PowerGraph, and Chaos, to efficiently perform CGP jobs and enhance system throughput. The key component of GraphTune is a dependency-aware synchronous execution engine in conjunction with several optimization strategies based on the constructed cross-iteration dependency graph of chunks. Specifically, GraphTune transparently regularizes the processing behavior of the CGP jobs in a novel synchronous way and assigns the chunks of graph data to be handled by them based on the topological order of the dependency graph so as to maximize the performance. In this way, it can transform the irregular accesses of the chunks into more regular ones so that as many CGP jobs as possible can fully share the data accesses to the common graph. Meanwhile, it also efficiently synchronizes the communications launched by different CGP jobs based on the dependency graph to minimize the communication cost. We integrate it into four cutting-edge distributed graph processing systems and a popular out-of-core graph processing system to demonstrate the efficiency of GraphTune. Experimental results show that GraphTune improves the throughput of CGP jobs by 3.1 ∼ 6.2, 3.8 ∼ 8.5, 3.5 ∼ 10.8, 4.3 ∼ 12.4, and 3.8 ∼ 6.9 times over D-Galois, Gemini, PowerGraph, and Chaos, and GraphChi, respectively.
doi_str_mv 10.1145/3600091
format article
fullrecord <record><control><sourceid>acm_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3600091</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3600091</sourcerecordid><originalsourceid>FETCH-LOGICAL-a239t-8470d2147310144c546bbeb02c24cab400543f55416d3fe2a58a59678044ae2c3</originalsourceid><addsrcrecordid>eNo9kEFPAjEQRhujiYjGu6fePK1O2-ku640gIgmJJuJ50y2zWF26pN3V8O8FAU8z883LO3yMXQu4EwL1vUoBIBcnrCc0YqLyTJ0ed52m5-wixk8AmUuAHvuaBLP-mHeeHvjQ83FVOevIt_yR1uQX5O0mMT8mEH_rytgG0xJvGz6sa_p2u2MaAi272gTXbrjzfNR4222zreJPzV9DYylG55eX7KwydaSrw-yz96fxfPSczF4m09Fwlhip8jYZYAYLKTBTAgSi1ZiWJZUgrURrSgTQqCqtUaQLVZE0emB0nmYDQDQkreqz273XhibGQFWxDm5lwqYQUOw6Kg4dbcmbPWns6h86Pn8BQFNhIA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>GraphTune: An Efficient Dependency-aware Substrate to Alleviate Irregularity in Concurrent Graph Processing</title><source>Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list)</source><creator>Zhao, Jin ; Zhang, Yu ; He, Ligang ; Li, Qikun ; Zhang, Xiang ; Jiang, Xinyu ; Yu, Hui ; Liao, Xiaofei ; Jin, Hai ; Gu, Lin ; Liu, Haikun ; He, Bingsheng ; Zhang, Ji ; Song, Xianzheng ; Wang, Lin ; Zhou, Jun</creator><creatorcontrib>Zhao, Jin ; Zhang, Yu ; He, Ligang ; Li, Qikun ; Zhang, Xiang ; Jiang, Xinyu ; Yu, Hui ; Liao, Xiaofei ; Jin, Hai ; Gu, Lin ; Liu, Haikun ; He, Bingsheng ; Zhang, Ji ; Song, Xianzheng ; Wang, Lin ; Zhou, Jun</creatorcontrib><description>With the increasing need for graph analysis, massive Concurrent iterative Graph Processing (CGP) jobs are usually performed on the common large-scale real-world graph. Although several solutions have been proposed, these CGP jobs are not coordinated with the consideration of the inherent dependencies in graph data driven by graph topology. As a result, they suffer from redundant and fragmented accesses of the same underlying graph dispersed over distributed platform, because the same graph is typically irregularly traversed by these jobs along different paths at the same time. In this work, we develop GraphTune, which can be integrated into existing distributed graph processing systems, such as D-Galois, Gemini, PowerGraph, and Chaos, to efficiently perform CGP jobs and enhance system throughput. The key component of GraphTune is a dependency-aware synchronous execution engine in conjunction with several optimization strategies based on the constructed cross-iteration dependency graph of chunks. Specifically, GraphTune transparently regularizes the processing behavior of the CGP jobs in a novel synchronous way and assigns the chunks of graph data to be handled by them based on the topological order of the dependency graph so as to maximize the performance. In this way, it can transform the irregular accesses of the chunks into more regular ones so that as many CGP jobs as possible can fully share the data accesses to the common graph. Meanwhile, it also efficiently synchronizes the communications launched by different CGP jobs based on the dependency graph to minimize the communication cost. We integrate it into four cutting-edge distributed graph processing systems and a popular out-of-core graph processing system to demonstrate the efficiency of GraphTune. Experimental results show that GraphTune improves the throughput of CGP jobs by 3.1 ∼ 6.2, 3.8 ∼ 8.5, 3.5 ∼ 10.8, 4.3 ∼ 12.4, and 3.8 ∼ 6.9 times over D-Galois, Gemini, PowerGraph, and Chaos, and GraphChi, respectively.</description><identifier>ISSN: 1544-3566</identifier><identifier>EISSN: 1544-3973</identifier><identifier>DOI: 10.1145/3600091</identifier><language>eng</language><publisher>New York, NY: ACM</publisher><subject>Computer systems organization ; Parallel architectures ; Special purpose systems</subject><ispartof>ACM transactions on architecture and code optimization, 2023-09, Vol.20 (3), p.1-24</ispartof><rights>Copyright held by the owner/author(s).</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-a239t-8470d2147310144c546bbeb02c24cab400543f55416d3fe2a58a59678044ae2c3</cites><orcidid>0000-0001-6033-6102 ; 0000-0002-6525-9334 ; 0000-0002-5671-0576 ; 0000-0003-0029-6027 ; 0000-0002-1244-2880 ; 0000-0003-4290-1408 ; 0000-0003-4217-7886 ; 0000-0001-6302-813X ; 0000-0002-3934-7605 ; 0009-0007-9559-9800 ; 0000-0002-9807-9479 ; 0009-0005-6700-4713 ; 0009-0008-6054-6036 ; 0000-0002-6559-6111 ; 0000-0003-0718-8045 ; 0000-0001-8618-4581</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27898,27899</link.rule.ids></links><search><creatorcontrib>Zhao, Jin</creatorcontrib><creatorcontrib>Zhang, Yu</creatorcontrib><creatorcontrib>He, Ligang</creatorcontrib><creatorcontrib>Li, Qikun</creatorcontrib><creatorcontrib>Zhang, Xiang</creatorcontrib><creatorcontrib>Jiang, Xinyu</creatorcontrib><creatorcontrib>Yu, Hui</creatorcontrib><creatorcontrib>Liao, Xiaofei</creatorcontrib><creatorcontrib>Jin, Hai</creatorcontrib><creatorcontrib>Gu, Lin</creatorcontrib><creatorcontrib>Liu, Haikun</creatorcontrib><creatorcontrib>He, Bingsheng</creatorcontrib><creatorcontrib>Zhang, Ji</creatorcontrib><creatorcontrib>Song, Xianzheng</creatorcontrib><creatorcontrib>Wang, Lin</creatorcontrib><creatorcontrib>Zhou, Jun</creatorcontrib><title>GraphTune: An Efficient Dependency-aware Substrate to Alleviate Irregularity in Concurrent Graph Processing</title><title>ACM transactions on architecture and code optimization</title><addtitle>ACM TACO</addtitle><description>With the increasing need for graph analysis, massive Concurrent iterative Graph Processing (CGP) jobs are usually performed on the common large-scale real-world graph. Although several solutions have been proposed, these CGP jobs are not coordinated with the consideration of the inherent dependencies in graph data driven by graph topology. As a result, they suffer from redundant and fragmented accesses of the same underlying graph dispersed over distributed platform, because the same graph is typically irregularly traversed by these jobs along different paths at the same time. In this work, we develop GraphTune, which can be integrated into existing distributed graph processing systems, such as D-Galois, Gemini, PowerGraph, and Chaos, to efficiently perform CGP jobs and enhance system throughput. The key component of GraphTune is a dependency-aware synchronous execution engine in conjunction with several optimization strategies based on the constructed cross-iteration dependency graph of chunks. Specifically, GraphTune transparently regularizes the processing behavior of the CGP jobs in a novel synchronous way and assigns the chunks of graph data to be handled by them based on the topological order of the dependency graph so as to maximize the performance. In this way, it can transform the irregular accesses of the chunks into more regular ones so that as many CGP jobs as possible can fully share the data accesses to the common graph. Meanwhile, it also efficiently synchronizes the communications launched by different CGP jobs based on the dependency graph to minimize the communication cost. We integrate it into four cutting-edge distributed graph processing systems and a popular out-of-core graph processing system to demonstrate the efficiency of GraphTune. Experimental results show that GraphTune improves the throughput of CGP jobs by 3.1 ∼ 6.2, 3.8 ∼ 8.5, 3.5 ∼ 10.8, 4.3 ∼ 12.4, and 3.8 ∼ 6.9 times over D-Galois, Gemini, PowerGraph, and Chaos, and GraphChi, respectively.</description><subject>Computer systems organization</subject><subject>Parallel architectures</subject><subject>Special purpose systems</subject><issn>1544-3566</issn><issn>1544-3973</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNo9kEFPAjEQRhujiYjGu6fePK1O2-ku640gIgmJJuJ50y2zWF26pN3V8O8FAU8z883LO3yMXQu4EwL1vUoBIBcnrCc0YqLyTJ0ed52m5-wixk8AmUuAHvuaBLP-mHeeHvjQ83FVOevIt_yR1uQX5O0mMT8mEH_rytgG0xJvGz6sa_p2u2MaAi272gTXbrjzfNR4222zreJPzV9DYylG55eX7KwydaSrw-yz96fxfPSczF4m09Fwlhip8jYZYAYLKTBTAgSi1ZiWJZUgrURrSgTQqCqtUaQLVZE0emB0nmYDQDQkreqz273XhibGQFWxDm5lwqYQUOw6Kg4dbcmbPWns6h86Pn8BQFNhIA</recordid><startdate>20230930</startdate><enddate>20230930</enddate><creator>Zhao, Jin</creator><creator>Zhang, Yu</creator><creator>He, Ligang</creator><creator>Li, Qikun</creator><creator>Zhang, Xiang</creator><creator>Jiang, Xinyu</creator><creator>Yu, Hui</creator><creator>Liao, Xiaofei</creator><creator>Jin, Hai</creator><creator>Gu, Lin</creator><creator>Liu, Haikun</creator><creator>He, Bingsheng</creator><creator>Zhang, Ji</creator><creator>Song, Xianzheng</creator><creator>Wang, Lin</creator><creator>Zhou, Jun</creator><general>ACM</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0001-6033-6102</orcidid><orcidid>https://orcid.org/0000-0002-6525-9334</orcidid><orcidid>https://orcid.org/0000-0002-5671-0576</orcidid><orcidid>https://orcid.org/0000-0003-0029-6027</orcidid><orcidid>https://orcid.org/0000-0002-1244-2880</orcidid><orcidid>https://orcid.org/0000-0003-4290-1408</orcidid><orcidid>https://orcid.org/0000-0003-4217-7886</orcidid><orcidid>https://orcid.org/0000-0001-6302-813X</orcidid><orcidid>https://orcid.org/0000-0002-3934-7605</orcidid><orcidid>https://orcid.org/0009-0007-9559-9800</orcidid><orcidid>https://orcid.org/0000-0002-9807-9479</orcidid><orcidid>https://orcid.org/0009-0005-6700-4713</orcidid><orcidid>https://orcid.org/0009-0008-6054-6036</orcidid><orcidid>https://orcid.org/0000-0002-6559-6111</orcidid><orcidid>https://orcid.org/0000-0003-0718-8045</orcidid><orcidid>https://orcid.org/0000-0001-8618-4581</orcidid></search><sort><creationdate>20230930</creationdate><title>GraphTune: An Efficient Dependency-aware Substrate to Alleviate Irregularity in Concurrent Graph Processing</title><author>Zhao, Jin ; Zhang, Yu ; He, Ligang ; Li, Qikun ; Zhang, Xiang ; Jiang, Xinyu ; Yu, Hui ; Liao, Xiaofei ; Jin, Hai ; Gu, Lin ; Liu, Haikun ; He, Bingsheng ; Zhang, Ji ; Song, Xianzheng ; Wang, Lin ; Zhou, Jun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a239t-8470d2147310144c546bbeb02c24cab400543f55416d3fe2a58a59678044ae2c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer systems organization</topic><topic>Parallel architectures</topic><topic>Special purpose systems</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhao, Jin</creatorcontrib><creatorcontrib>Zhang, Yu</creatorcontrib><creatorcontrib>He, Ligang</creatorcontrib><creatorcontrib>Li, Qikun</creatorcontrib><creatorcontrib>Zhang, Xiang</creatorcontrib><creatorcontrib>Jiang, Xinyu</creatorcontrib><creatorcontrib>Yu, Hui</creatorcontrib><creatorcontrib>Liao, Xiaofei</creatorcontrib><creatorcontrib>Jin, Hai</creatorcontrib><creatorcontrib>Gu, Lin</creatorcontrib><creatorcontrib>Liu, Haikun</creatorcontrib><creatorcontrib>He, Bingsheng</creatorcontrib><creatorcontrib>Zhang, Ji</creatorcontrib><creatorcontrib>Song, Xianzheng</creatorcontrib><creatorcontrib>Wang, Lin</creatorcontrib><creatorcontrib>Zhou, Jun</creatorcontrib><collection>CrossRef</collection><jtitle>ACM transactions on architecture and code optimization</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhao, Jin</au><au>Zhang, Yu</au><au>He, Ligang</au><au>Li, Qikun</au><au>Zhang, Xiang</au><au>Jiang, Xinyu</au><au>Yu, Hui</au><au>Liao, Xiaofei</au><au>Jin, Hai</au><au>Gu, Lin</au><au>Liu, Haikun</au><au>He, Bingsheng</au><au>Zhang, Ji</au><au>Song, Xianzheng</au><au>Wang, Lin</au><au>Zhou, Jun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>GraphTune: An Efficient Dependency-aware Substrate to Alleviate Irregularity in Concurrent Graph Processing</atitle><jtitle>ACM transactions on architecture and code optimization</jtitle><stitle>ACM TACO</stitle><date>2023-09-30</date><risdate>2023</risdate><volume>20</volume><issue>3</issue><spage>1</spage><epage>24</epage><pages>1-24</pages><issn>1544-3566</issn><eissn>1544-3973</eissn><abstract>With the increasing need for graph analysis, massive Concurrent iterative Graph Processing (CGP) jobs are usually performed on the common large-scale real-world graph. Although several solutions have been proposed, these CGP jobs are not coordinated with the consideration of the inherent dependencies in graph data driven by graph topology. As a result, they suffer from redundant and fragmented accesses of the same underlying graph dispersed over distributed platform, because the same graph is typically irregularly traversed by these jobs along different paths at the same time. In this work, we develop GraphTune, which can be integrated into existing distributed graph processing systems, such as D-Galois, Gemini, PowerGraph, and Chaos, to efficiently perform CGP jobs and enhance system throughput. The key component of GraphTune is a dependency-aware synchronous execution engine in conjunction with several optimization strategies based on the constructed cross-iteration dependency graph of chunks. Specifically, GraphTune transparently regularizes the processing behavior of the CGP jobs in a novel synchronous way and assigns the chunks of graph data to be handled by them based on the topological order of the dependency graph so as to maximize the performance. In this way, it can transform the irregular accesses of the chunks into more regular ones so that as many CGP jobs as possible can fully share the data accesses to the common graph. Meanwhile, it also efficiently synchronizes the communications launched by different CGP jobs based on the dependency graph to minimize the communication cost. We integrate it into four cutting-edge distributed graph processing systems and a popular out-of-core graph processing system to demonstrate the efficiency of GraphTune. Experimental results show that GraphTune improves the throughput of CGP jobs by 3.1 ∼ 6.2, 3.8 ∼ 8.5, 3.5 ∼ 10.8, 4.3 ∼ 12.4, and 3.8 ∼ 6.9 times over D-Galois, Gemini, PowerGraph, and Chaos, and GraphChi, respectively.</abstract><cop>New York, NY</cop><pub>ACM</pub><doi>10.1145/3600091</doi><tpages>24</tpages><orcidid>https://orcid.org/0000-0001-6033-6102</orcidid><orcidid>https://orcid.org/0000-0002-6525-9334</orcidid><orcidid>https://orcid.org/0000-0002-5671-0576</orcidid><orcidid>https://orcid.org/0000-0003-0029-6027</orcidid><orcidid>https://orcid.org/0000-0002-1244-2880</orcidid><orcidid>https://orcid.org/0000-0003-4290-1408</orcidid><orcidid>https://orcid.org/0000-0003-4217-7886</orcidid><orcidid>https://orcid.org/0000-0001-6302-813X</orcidid><orcidid>https://orcid.org/0000-0002-3934-7605</orcidid><orcidid>https://orcid.org/0009-0007-9559-9800</orcidid><orcidid>https://orcid.org/0000-0002-9807-9479</orcidid><orcidid>https://orcid.org/0009-0005-6700-4713</orcidid><orcidid>https://orcid.org/0009-0008-6054-6036</orcidid><orcidid>https://orcid.org/0000-0002-6559-6111</orcidid><orcidid>https://orcid.org/0000-0003-0718-8045</orcidid><orcidid>https://orcid.org/0000-0001-8618-4581</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1544-3566
ispartof ACM transactions on architecture and code optimization, 2023-09, Vol.20 (3), p.1-24
issn 1544-3566
1544-3973
language eng
recordid cdi_crossref_primary_10_1145_3600091
source Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list)
subjects Computer systems organization
Parallel architectures
Special purpose systems
title GraphTune: An Efficient Dependency-aware Substrate to Alleviate Irregularity in Concurrent Graph Processing
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-27T04%3A29%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=GraphTune:%20An%20Efficient%20Dependency-aware%20Substrate%20to%20Alleviate%20Irregularity%20in%20Concurrent%20Graph%20Processing&rft.jtitle=ACM%20transactions%20on%20architecture%20and%20code%20optimization&rft.au=Zhao,%20Jin&rft.date=2023-09-30&rft.volume=20&rft.issue=3&rft.spage=1&rft.epage=24&rft.pages=1-24&rft.issn=1544-3566&rft.eissn=1544-3973&rft_id=info:doi/10.1145/3600091&rft_dat=%3Cacm_cross%3E3600091%3C/acm_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-a239t-8470d2147310144c546bbeb02c24cab400543f55416d3fe2a58a59678044ae2c3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true