Loading…
GraphTune: An Efficient Dependency-aware Substrate to Alleviate Irregularity in Concurrent Graph Processing
With the increasing need for graph analysis, massive Concurrent iterative Graph Processing (CGP) jobs are usually performed on the common large-scale real-world graph. Although several solutions have been proposed, these CGP jobs are not coordinated with the consideration of the inherent dependencie...
Saved in:
Published in: | ACM transactions on architecture and code optimization 2023-09, Vol.20 (3), p.1-24 |
---|---|
Main Authors: | , , , , , , , , , , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | cdi_FETCH-LOGICAL-a239t-8470d2147310144c546bbeb02c24cab400543f55416d3fe2a58a59678044ae2c3 |
container_end_page | 24 |
container_issue | 3 |
container_start_page | 1 |
container_title | ACM transactions on architecture and code optimization |
container_volume | 20 |
creator | Zhao, Jin Zhang, Yu He, Ligang Li, Qikun Zhang, Xiang Jiang, Xinyu Yu, Hui Liao, Xiaofei Jin, Hai Gu, Lin Liu, Haikun He, Bingsheng Zhang, Ji Song, Xianzheng Wang, Lin Zhou, Jun |
description | With the increasing need for graph analysis, massive Concurrent iterative Graph Processing (CGP) jobs are usually performed on the common large-scale real-world graph. Although several solutions have been proposed, these CGP jobs are not coordinated with the consideration of the inherent dependencies in graph data driven by graph topology. As a result, they suffer from redundant and fragmented accesses of the same underlying graph dispersed over distributed platform, because the same graph is typically irregularly traversed by these jobs along different paths at the same time. In this work, we develop GraphTune, which can be integrated into existing distributed graph processing systems, such as D-Galois, Gemini, PowerGraph, and Chaos, to efficiently perform CGP jobs and enhance system throughput. The key component of GraphTune is a dependency-aware synchronous execution engine in conjunction with several optimization strategies based on the constructed cross-iteration dependency graph of chunks. Specifically, GraphTune transparently regularizes the processing behavior of the CGP jobs in a novel synchronous way and assigns the chunks of graph data to be handled by them based on the topological order of the dependency graph so as to maximize the performance. In this way, it can transform the irregular accesses of the chunks into more regular ones so that as many CGP jobs as possible can fully share the data accesses to the common graph. Meanwhile, it also efficiently synchronizes the communications launched by different CGP jobs based on the dependency graph to minimize the communication cost. We integrate it into four cutting-edge distributed graph processing systems and a popular out-of-core graph processing system to demonstrate the efficiency of GraphTune. Experimental results show that GraphTune improves the throughput of CGP jobs by 3.1 ∼ 6.2, 3.8 ∼ 8.5, 3.5 ∼ 10.8, 4.3 ∼ 12.4, and 3.8 ∼ 6.9 times over D-Galois, Gemini, PowerGraph, and Chaos, and GraphChi, respectively. |
doi_str_mv | 10.1145/3600091 |
format | article |
fullrecord | <record><control><sourceid>acm_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3600091</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3600091</sourcerecordid><originalsourceid>FETCH-LOGICAL-a239t-8470d2147310144c546bbeb02c24cab400543f55416d3fe2a58a59678044ae2c3</originalsourceid><addsrcrecordid>eNo9kEFPAjEQRhujiYjGu6fePK1O2-ku640gIgmJJuJ50y2zWF26pN3V8O8FAU8z883LO3yMXQu4EwL1vUoBIBcnrCc0YqLyTJ0ed52m5-wixk8AmUuAHvuaBLP-mHeeHvjQ83FVOevIt_yR1uQX5O0mMT8mEH_rytgG0xJvGz6sa_p2u2MaAi272gTXbrjzfNR4222zreJPzV9DYylG55eX7KwydaSrw-yz96fxfPSczF4m09Fwlhip8jYZYAYLKTBTAgSi1ZiWJZUgrURrSgTQqCqtUaQLVZE0emB0nmYDQDQkreqz273XhibGQFWxDm5lwqYQUOw6Kg4dbcmbPWns6h86Pn8BQFNhIA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>GraphTune: An Efficient Dependency-aware Substrate to Alleviate Irregularity in Concurrent Graph Processing</title><source>Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list)</source><creator>Zhao, Jin ; Zhang, Yu ; He, Ligang ; Li, Qikun ; Zhang, Xiang ; Jiang, Xinyu ; Yu, Hui ; Liao, Xiaofei ; Jin, Hai ; Gu, Lin ; Liu, Haikun ; He, Bingsheng ; Zhang, Ji ; Song, Xianzheng ; Wang, Lin ; Zhou, Jun</creator><creatorcontrib>Zhao, Jin ; Zhang, Yu ; He, Ligang ; Li, Qikun ; Zhang, Xiang ; Jiang, Xinyu ; Yu, Hui ; Liao, Xiaofei ; Jin, Hai ; Gu, Lin ; Liu, Haikun ; He, Bingsheng ; Zhang, Ji ; Song, Xianzheng ; Wang, Lin ; Zhou, Jun</creatorcontrib><description>With the increasing need for graph analysis, massive Concurrent iterative Graph Processing (CGP) jobs are usually performed on the common large-scale real-world graph. Although several solutions have been proposed, these CGP jobs are not coordinated with the consideration of the inherent dependencies in graph data driven by graph topology. As a result, they suffer from redundant and fragmented accesses of the same underlying graph dispersed over distributed platform, because the same graph is typically irregularly traversed by these jobs along different paths at the same time. In this work, we develop GraphTune, which can be integrated into existing distributed graph processing systems, such as D-Galois, Gemini, PowerGraph, and Chaos, to efficiently perform CGP jobs and enhance system throughput. The key component of GraphTune is a dependency-aware synchronous execution engine in conjunction with several optimization strategies based on the constructed cross-iteration dependency graph of chunks. Specifically, GraphTune transparently regularizes the processing behavior of the CGP jobs in a novel synchronous way and assigns the chunks of graph data to be handled by them based on the topological order of the dependency graph so as to maximize the performance. In this way, it can transform the irregular accesses of the chunks into more regular ones so that as many CGP jobs as possible can fully share the data accesses to the common graph. Meanwhile, it also efficiently synchronizes the communications launched by different CGP jobs based on the dependency graph to minimize the communication cost. We integrate it into four cutting-edge distributed graph processing systems and a popular out-of-core graph processing system to demonstrate the efficiency of GraphTune. Experimental results show that GraphTune improves the throughput of CGP jobs by 3.1 ∼ 6.2, 3.8 ∼ 8.5, 3.5 ∼ 10.8, 4.3 ∼ 12.4, and 3.8 ∼ 6.9 times over D-Galois, Gemini, PowerGraph, and Chaos, and GraphChi, respectively.</description><identifier>ISSN: 1544-3566</identifier><identifier>EISSN: 1544-3973</identifier><identifier>DOI: 10.1145/3600091</identifier><language>eng</language><publisher>New York, NY: ACM</publisher><subject>Computer systems organization ; Parallel architectures ; Special purpose systems</subject><ispartof>ACM transactions on architecture and code optimization, 2023-09, Vol.20 (3), p.1-24</ispartof><rights>Copyright held by the owner/author(s).</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-a239t-8470d2147310144c546bbeb02c24cab400543f55416d3fe2a58a59678044ae2c3</cites><orcidid>0000-0001-6033-6102 ; 0000-0002-6525-9334 ; 0000-0002-5671-0576 ; 0000-0003-0029-6027 ; 0000-0002-1244-2880 ; 0000-0003-4290-1408 ; 0000-0003-4217-7886 ; 0000-0001-6302-813X ; 0000-0002-3934-7605 ; 0009-0007-9559-9800 ; 0000-0002-9807-9479 ; 0009-0005-6700-4713 ; 0009-0008-6054-6036 ; 0000-0002-6559-6111 ; 0000-0003-0718-8045 ; 0000-0001-8618-4581</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27898,27899</link.rule.ids></links><search><creatorcontrib>Zhao, Jin</creatorcontrib><creatorcontrib>Zhang, Yu</creatorcontrib><creatorcontrib>He, Ligang</creatorcontrib><creatorcontrib>Li, Qikun</creatorcontrib><creatorcontrib>Zhang, Xiang</creatorcontrib><creatorcontrib>Jiang, Xinyu</creatorcontrib><creatorcontrib>Yu, Hui</creatorcontrib><creatorcontrib>Liao, Xiaofei</creatorcontrib><creatorcontrib>Jin, Hai</creatorcontrib><creatorcontrib>Gu, Lin</creatorcontrib><creatorcontrib>Liu, Haikun</creatorcontrib><creatorcontrib>He, Bingsheng</creatorcontrib><creatorcontrib>Zhang, Ji</creatorcontrib><creatorcontrib>Song, Xianzheng</creatorcontrib><creatorcontrib>Wang, Lin</creatorcontrib><creatorcontrib>Zhou, Jun</creatorcontrib><title>GraphTune: An Efficient Dependency-aware Substrate to Alleviate Irregularity in Concurrent Graph Processing</title><title>ACM transactions on architecture and code optimization</title><addtitle>ACM TACO</addtitle><description>With the increasing need for graph analysis, massive Concurrent iterative Graph Processing (CGP) jobs are usually performed on the common large-scale real-world graph. Although several solutions have been proposed, these CGP jobs are not coordinated with the consideration of the inherent dependencies in graph data driven by graph topology. As a result, they suffer from redundant and fragmented accesses of the same underlying graph dispersed over distributed platform, because the same graph is typically irregularly traversed by these jobs along different paths at the same time. In this work, we develop GraphTune, which can be integrated into existing distributed graph processing systems, such as D-Galois, Gemini, PowerGraph, and Chaos, to efficiently perform CGP jobs and enhance system throughput. The key component of GraphTune is a dependency-aware synchronous execution engine in conjunction with several optimization strategies based on the constructed cross-iteration dependency graph of chunks. Specifically, GraphTune transparently regularizes the processing behavior of the CGP jobs in a novel synchronous way and assigns the chunks of graph data to be handled by them based on the topological order of the dependency graph so as to maximize the performance. In this way, it can transform the irregular accesses of the chunks into more regular ones so that as many CGP jobs as possible can fully share the data accesses to the common graph. Meanwhile, it also efficiently synchronizes the communications launched by different CGP jobs based on the dependency graph to minimize the communication cost. We integrate it into four cutting-edge distributed graph processing systems and a popular out-of-core graph processing system to demonstrate the efficiency of GraphTune. Experimental results show that GraphTune improves the throughput of CGP jobs by 3.1 ∼ 6.2, 3.8 ∼ 8.5, 3.5 ∼ 10.8, 4.3 ∼ 12.4, and 3.8 ∼ 6.9 times over D-Galois, Gemini, PowerGraph, and Chaos, and GraphChi, respectively.</description><subject>Computer systems organization</subject><subject>Parallel architectures</subject><subject>Special purpose systems</subject><issn>1544-3566</issn><issn>1544-3973</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNo9kEFPAjEQRhujiYjGu6fePK1O2-ku640gIgmJJuJ50y2zWF26pN3V8O8FAU8z883LO3yMXQu4EwL1vUoBIBcnrCc0YqLyTJ0ed52m5-wixk8AmUuAHvuaBLP-mHeeHvjQ83FVOevIt_yR1uQX5O0mMT8mEH_rytgG0xJvGz6sa_p2u2MaAi272gTXbrjzfNR4222zreJPzV9DYylG55eX7KwydaSrw-yz96fxfPSczF4m09Fwlhip8jYZYAYLKTBTAgSi1ZiWJZUgrURrSgTQqCqtUaQLVZE0emB0nmYDQDQkreqz273XhibGQFWxDm5lwqYQUOw6Kg4dbcmbPWns6h86Pn8BQFNhIA</recordid><startdate>20230930</startdate><enddate>20230930</enddate><creator>Zhao, Jin</creator><creator>Zhang, Yu</creator><creator>He, Ligang</creator><creator>Li, Qikun</creator><creator>Zhang, Xiang</creator><creator>Jiang, Xinyu</creator><creator>Yu, Hui</creator><creator>Liao, Xiaofei</creator><creator>Jin, Hai</creator><creator>Gu, Lin</creator><creator>Liu, Haikun</creator><creator>He, Bingsheng</creator><creator>Zhang, Ji</creator><creator>Song, Xianzheng</creator><creator>Wang, Lin</creator><creator>Zhou, Jun</creator><general>ACM</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0001-6033-6102</orcidid><orcidid>https://orcid.org/0000-0002-6525-9334</orcidid><orcidid>https://orcid.org/0000-0002-5671-0576</orcidid><orcidid>https://orcid.org/0000-0003-0029-6027</orcidid><orcidid>https://orcid.org/0000-0002-1244-2880</orcidid><orcidid>https://orcid.org/0000-0003-4290-1408</orcidid><orcidid>https://orcid.org/0000-0003-4217-7886</orcidid><orcidid>https://orcid.org/0000-0001-6302-813X</orcidid><orcidid>https://orcid.org/0000-0002-3934-7605</orcidid><orcidid>https://orcid.org/0009-0007-9559-9800</orcidid><orcidid>https://orcid.org/0000-0002-9807-9479</orcidid><orcidid>https://orcid.org/0009-0005-6700-4713</orcidid><orcidid>https://orcid.org/0009-0008-6054-6036</orcidid><orcidid>https://orcid.org/0000-0002-6559-6111</orcidid><orcidid>https://orcid.org/0000-0003-0718-8045</orcidid><orcidid>https://orcid.org/0000-0001-8618-4581</orcidid></search><sort><creationdate>20230930</creationdate><title>GraphTune: An Efficient Dependency-aware Substrate to Alleviate Irregularity in Concurrent Graph Processing</title><author>Zhao, Jin ; Zhang, Yu ; He, Ligang ; Li, Qikun ; Zhang, Xiang ; Jiang, Xinyu ; Yu, Hui ; Liao, Xiaofei ; Jin, Hai ; Gu, Lin ; Liu, Haikun ; He, Bingsheng ; Zhang, Ji ; Song, Xianzheng ; Wang, Lin ; Zhou, Jun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a239t-8470d2147310144c546bbeb02c24cab400543f55416d3fe2a58a59678044ae2c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer systems organization</topic><topic>Parallel architectures</topic><topic>Special purpose systems</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhao, Jin</creatorcontrib><creatorcontrib>Zhang, Yu</creatorcontrib><creatorcontrib>He, Ligang</creatorcontrib><creatorcontrib>Li, Qikun</creatorcontrib><creatorcontrib>Zhang, Xiang</creatorcontrib><creatorcontrib>Jiang, Xinyu</creatorcontrib><creatorcontrib>Yu, Hui</creatorcontrib><creatorcontrib>Liao, Xiaofei</creatorcontrib><creatorcontrib>Jin, Hai</creatorcontrib><creatorcontrib>Gu, Lin</creatorcontrib><creatorcontrib>Liu, Haikun</creatorcontrib><creatorcontrib>He, Bingsheng</creatorcontrib><creatorcontrib>Zhang, Ji</creatorcontrib><creatorcontrib>Song, Xianzheng</creatorcontrib><creatorcontrib>Wang, Lin</creatorcontrib><creatorcontrib>Zhou, Jun</creatorcontrib><collection>CrossRef</collection><jtitle>ACM transactions on architecture and code optimization</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhao, Jin</au><au>Zhang, Yu</au><au>He, Ligang</au><au>Li, Qikun</au><au>Zhang, Xiang</au><au>Jiang, Xinyu</au><au>Yu, Hui</au><au>Liao, Xiaofei</au><au>Jin, Hai</au><au>Gu, Lin</au><au>Liu, Haikun</au><au>He, Bingsheng</au><au>Zhang, Ji</au><au>Song, Xianzheng</au><au>Wang, Lin</au><au>Zhou, Jun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>GraphTune: An Efficient Dependency-aware Substrate to Alleviate Irregularity in Concurrent Graph Processing</atitle><jtitle>ACM transactions on architecture and code optimization</jtitle><stitle>ACM TACO</stitle><date>2023-09-30</date><risdate>2023</risdate><volume>20</volume><issue>3</issue><spage>1</spage><epage>24</epage><pages>1-24</pages><issn>1544-3566</issn><eissn>1544-3973</eissn><abstract>With the increasing need for graph analysis, massive Concurrent iterative Graph Processing (CGP) jobs are usually performed on the common large-scale real-world graph. Although several solutions have been proposed, these CGP jobs are not coordinated with the consideration of the inherent dependencies in graph data driven by graph topology. As a result, they suffer from redundant and fragmented accesses of the same underlying graph dispersed over distributed platform, because the same graph is typically irregularly traversed by these jobs along different paths at the same time. In this work, we develop GraphTune, which can be integrated into existing distributed graph processing systems, such as D-Galois, Gemini, PowerGraph, and Chaos, to efficiently perform CGP jobs and enhance system throughput. The key component of GraphTune is a dependency-aware synchronous execution engine in conjunction with several optimization strategies based on the constructed cross-iteration dependency graph of chunks. Specifically, GraphTune transparently regularizes the processing behavior of the CGP jobs in a novel synchronous way and assigns the chunks of graph data to be handled by them based on the topological order of the dependency graph so as to maximize the performance. In this way, it can transform the irregular accesses of the chunks into more regular ones so that as many CGP jobs as possible can fully share the data accesses to the common graph. Meanwhile, it also efficiently synchronizes the communications launched by different CGP jobs based on the dependency graph to minimize the communication cost. We integrate it into four cutting-edge distributed graph processing systems and a popular out-of-core graph processing system to demonstrate the efficiency of GraphTune. Experimental results show that GraphTune improves the throughput of CGP jobs by 3.1 ∼ 6.2, 3.8 ∼ 8.5, 3.5 ∼ 10.8, 4.3 ∼ 12.4, and 3.8 ∼ 6.9 times over D-Galois, Gemini, PowerGraph, and Chaos, and GraphChi, respectively.</abstract><cop>New York, NY</cop><pub>ACM</pub><doi>10.1145/3600091</doi><tpages>24</tpages><orcidid>https://orcid.org/0000-0001-6033-6102</orcidid><orcidid>https://orcid.org/0000-0002-6525-9334</orcidid><orcidid>https://orcid.org/0000-0002-5671-0576</orcidid><orcidid>https://orcid.org/0000-0003-0029-6027</orcidid><orcidid>https://orcid.org/0000-0002-1244-2880</orcidid><orcidid>https://orcid.org/0000-0003-4290-1408</orcidid><orcidid>https://orcid.org/0000-0003-4217-7886</orcidid><orcidid>https://orcid.org/0000-0001-6302-813X</orcidid><orcidid>https://orcid.org/0000-0002-3934-7605</orcidid><orcidid>https://orcid.org/0009-0007-9559-9800</orcidid><orcidid>https://orcid.org/0000-0002-9807-9479</orcidid><orcidid>https://orcid.org/0009-0005-6700-4713</orcidid><orcidid>https://orcid.org/0009-0008-6054-6036</orcidid><orcidid>https://orcid.org/0000-0002-6559-6111</orcidid><orcidid>https://orcid.org/0000-0003-0718-8045</orcidid><orcidid>https://orcid.org/0000-0001-8618-4581</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1544-3566 |
ispartof | ACM transactions on architecture and code optimization, 2023-09, Vol.20 (3), p.1-24 |
issn | 1544-3566 1544-3973 |
language | eng |
recordid | cdi_crossref_primary_10_1145_3600091 |
source | Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list) |
subjects | Computer systems organization Parallel architectures Special purpose systems |
title | GraphTune: An Efficient Dependency-aware Substrate to Alleviate Irregularity in Concurrent Graph Processing |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-27T04%3A29%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=GraphTune:%20An%20Efficient%20Dependency-aware%20Substrate%20to%20Alleviate%20Irregularity%20in%20Concurrent%20Graph%20Processing&rft.jtitle=ACM%20transactions%20on%20architecture%20and%20code%20optimization&rft.au=Zhao,%20Jin&rft.date=2023-09-30&rft.volume=20&rft.issue=3&rft.spage=1&rft.epage=24&rft.pages=1-24&rft.issn=1544-3566&rft.eissn=1544-3973&rft_id=info:doi/10.1145/3600091&rft_dat=%3Cacm_cross%3E3600091%3C/acm_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-a239t-8470d2147310144c546bbeb02c24cab400543f55416d3fe2a58a59678044ae2c3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |