Loading…

Scalable and Efficient Full-Graph GNN Training for Large Graphs

Graph Neural Networks (GNNs) have emerged as powerful tools to capture structural information from graph-structured data, achieving state-of-the-art performance on applications such as recommendation, knowledge graph, and search. Graphs in these domains typically contain hundreds of millions of node...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings of the ACM on management of data 2023-06, Vol.1 (2), p.1-23, Article 143
Main Authors: Wan, Xinchen, Xu, Kaiqiang, Liao, Xudong, Jin, Yilun, Chen, Kai, Jin, Xin
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-a898-4a910b2980d5348f151f5d549e10c40394d5843e7fb0e397e1925916d9ed3ce03
cites cdi_FETCH-LOGICAL-a898-4a910b2980d5348f151f5d549e10c40394d5843e7fb0e397e1925916d9ed3ce03
container_end_page 23
container_issue 2
container_start_page 1
container_title Proceedings of the ACM on management of data
container_volume 1
creator Wan, Xinchen
Xu, Kaiqiang
Liao, Xudong
Jin, Yilun
Chen, Kai
Jin, Xin
description Graph Neural Networks (GNNs) have emerged as powerful tools to capture structural information from graph-structured data, achieving state-of-the-art performance on applications such as recommendation, knowledge graph, and search. Graphs in these domains typically contain hundreds of millions of nodes and billions of edges. However, previous GNN systems demonstrate poor scalability because large and interleaved computation dependencies in GNN training cause significant overhead in current parallelization methods. We present G3, a distributed system that can efficiently train GNNs over billion-edge graphs at scale. G3 introduces GNN hybrid parallelism which synthesizes three dimensions of parallelism to scale out GNN training by sharing intermediate results peer-to-peer in fine granularity, eliminating layer-wise barriers for global collective communication or neighbor replications as seen in prior works. G3 leverages locality-aware iterative partitioning and multi-level pipeline scheduling to exploit acceleration opportunities by distributing balanced workload among workers and overlapping computation with communication in both inter-layer and intra-layer training processes. We show via a prototype implementation and comprehensive experiments that G3 can achieve as much as 2.24x speedup in a 16-node cluster, and better final accuracy over prior works.
doi_str_mv 10.1145/3589288
format article
fullrecord <record><control><sourceid>acm_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3589288</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3589288</sourcerecordid><originalsourceid>FETCH-LOGICAL-a898-4a910b2980d5348f151f5d549e10c40394d5843e7fb0e397e1925916d9ed3ce03</originalsourceid><addsrcrecordid>eNpNj89LwzAYhoMoOObw7ik3T9V8TdLmO4mMrQplO6z3kiZfZqXrRqIH_3t_bIqn94X34YWHsWsQdwBK30ttMDfmjE1yI4us0KU8_9cv2SylVyFEjoUELCbsYePsYLuBuB09X4TQu57GN758H4asivbwwqvVijfR9mM_bnnYR17buCX-M6YrdhHskGh2yilrlotm_pTV6-p5_lhn1qDJlEUQXY5GeC2VCaAhaK8VEginhETltVGSytAJklgSYK4RCo_kpSMhp-z2eOviPqVIoT3EfmfjRwui_TZvT-Zf5M2RtG73B_2On2ePUBA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Scalable and Efficient Full-Graph GNN Training for Large Graphs</title><source>ACM Digital Library</source><creator>Wan, Xinchen ; Xu, Kaiqiang ; Liao, Xudong ; Jin, Yilun ; Chen, Kai ; Jin, Xin</creator><creatorcontrib>Wan, Xinchen ; Xu, Kaiqiang ; Liao, Xudong ; Jin, Yilun ; Chen, Kai ; Jin, Xin</creatorcontrib><description>Graph Neural Networks (GNNs) have emerged as powerful tools to capture structural information from graph-structured data, achieving state-of-the-art performance on applications such as recommendation, knowledge graph, and search. Graphs in these domains typically contain hundreds of millions of nodes and billions of edges. However, previous GNN systems demonstrate poor scalability because large and interleaved computation dependencies in GNN training cause significant overhead in current parallelization methods. We present G3, a distributed system that can efficiently train GNNs over billion-edge graphs at scale. G3 introduces GNN hybrid parallelism which synthesizes three dimensions of parallelism to scale out GNN training by sharing intermediate results peer-to-peer in fine granularity, eliminating layer-wise barriers for global collective communication or neighbor replications as seen in prior works. G3 leverages locality-aware iterative partitioning and multi-level pipeline scheduling to exploit acceleration opportunities by distributing balanced workload among workers and overlapping computation with communication in both inter-layer and intra-layer training processes. We show via a prototype implementation and comprehensive experiments that G3 can achieve as much as 2.24x speedup in a 16-node cluster, and better final accuracy over prior works.</description><identifier>ISSN: 2836-6573</identifier><identifier>EISSN: 2836-6573</identifier><identifier>DOI: 10.1145/3589288</identifier><language>eng</language><publisher>New York, NY, USA: ACM</publisher><subject>Computing methodologies ; Data management systems ; Distributed computing methodologies ; Information systems</subject><ispartof>Proceedings of the ACM on management of data, 2023-06, Vol.1 (2), p.1-23, Article 143</ispartof><rights>ACM</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a898-4a910b2980d5348f151f5d549e10c40394d5843e7fb0e397e1925916d9ed3ce03</citedby><cites>FETCH-LOGICAL-a898-4a910b2980d5348f151f5d549e10c40394d5843e7fb0e397e1925916d9ed3ce03</cites><orcidid>0000-0003-0501-5968 ; 0000-0002-9502-7622 ; 0000-0001-8741-5847 ; 0000-0001-6503-5309 ; 0000-0003-2587-6028 ; 0000-0002-8380-1879</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://dl.acm.org/doi/pdf/10.1145/3589288$$EPDF$$P50$$Gacm$$H</linktopdf><link.rule.ids>314,780,784,2282,27924,27925,40196,76228</link.rule.ids></links><search><creatorcontrib>Wan, Xinchen</creatorcontrib><creatorcontrib>Xu, Kaiqiang</creatorcontrib><creatorcontrib>Liao, Xudong</creatorcontrib><creatorcontrib>Jin, Yilun</creatorcontrib><creatorcontrib>Chen, Kai</creatorcontrib><creatorcontrib>Jin, Xin</creatorcontrib><title>Scalable and Efficient Full-Graph GNN Training for Large Graphs</title><title>Proceedings of the ACM on management of data</title><addtitle>ACM PACMMOD</addtitle><description>Graph Neural Networks (GNNs) have emerged as powerful tools to capture structural information from graph-structured data, achieving state-of-the-art performance on applications such as recommendation, knowledge graph, and search. Graphs in these domains typically contain hundreds of millions of nodes and billions of edges. However, previous GNN systems demonstrate poor scalability because large and interleaved computation dependencies in GNN training cause significant overhead in current parallelization methods. We present G3, a distributed system that can efficiently train GNNs over billion-edge graphs at scale. G3 introduces GNN hybrid parallelism which synthesizes three dimensions of parallelism to scale out GNN training by sharing intermediate results peer-to-peer in fine granularity, eliminating layer-wise barriers for global collective communication or neighbor replications as seen in prior works. G3 leverages locality-aware iterative partitioning and multi-level pipeline scheduling to exploit acceleration opportunities by distributing balanced workload among workers and overlapping computation with communication in both inter-layer and intra-layer training processes. We show via a prototype implementation and comprehensive experiments that G3 can achieve as much as 2.24x speedup in a 16-node cluster, and better final accuracy over prior works.</description><subject>Computing methodologies</subject><subject>Data management systems</subject><subject>Distributed computing methodologies</subject><subject>Information systems</subject><issn>2836-6573</issn><issn>2836-6573</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNpNj89LwzAYhoMoOObw7ik3T9V8TdLmO4mMrQplO6z3kiZfZqXrRqIH_3t_bIqn94X34YWHsWsQdwBK30ttMDfmjE1yI4us0KU8_9cv2SylVyFEjoUELCbsYePsYLuBuB09X4TQu57GN758H4asivbwwqvVijfR9mM_bnnYR17buCX-M6YrdhHskGh2yilrlotm_pTV6-p5_lhn1qDJlEUQXY5GeC2VCaAhaK8VEginhETltVGSytAJklgSYK4RCo_kpSMhp-z2eOviPqVIoT3EfmfjRwui_TZvT-Zf5M2RtG73B_2On2ePUBA</recordid><startdate>20230620</startdate><enddate>20230620</enddate><creator>Wan, Xinchen</creator><creator>Xu, Kaiqiang</creator><creator>Liao, Xudong</creator><creator>Jin, Yilun</creator><creator>Chen, Kai</creator><creator>Jin, Xin</creator><general>ACM</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0003-0501-5968</orcidid><orcidid>https://orcid.org/0000-0002-9502-7622</orcidid><orcidid>https://orcid.org/0000-0001-8741-5847</orcidid><orcidid>https://orcid.org/0000-0001-6503-5309</orcidid><orcidid>https://orcid.org/0000-0003-2587-6028</orcidid><orcidid>https://orcid.org/0000-0002-8380-1879</orcidid></search><sort><creationdate>20230620</creationdate><title>Scalable and Efficient Full-Graph GNN Training for Large Graphs</title><author>Wan, Xinchen ; Xu, Kaiqiang ; Liao, Xudong ; Jin, Yilun ; Chen, Kai ; Jin, Xin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a898-4a910b2980d5348f151f5d549e10c40394d5843e7fb0e397e1925916d9ed3ce03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computing methodologies</topic><topic>Data management systems</topic><topic>Distributed computing methodologies</topic><topic>Information systems</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wan, Xinchen</creatorcontrib><creatorcontrib>Xu, Kaiqiang</creatorcontrib><creatorcontrib>Liao, Xudong</creatorcontrib><creatorcontrib>Jin, Yilun</creatorcontrib><creatorcontrib>Chen, Kai</creatorcontrib><creatorcontrib>Jin, Xin</creatorcontrib><collection>CrossRef</collection><jtitle>Proceedings of the ACM on management of data</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wan, Xinchen</au><au>Xu, Kaiqiang</au><au>Liao, Xudong</au><au>Jin, Yilun</au><au>Chen, Kai</au><au>Jin, Xin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Scalable and Efficient Full-Graph GNN Training for Large Graphs</atitle><jtitle>Proceedings of the ACM on management of data</jtitle><stitle>ACM PACMMOD</stitle><date>2023-06-20</date><risdate>2023</risdate><volume>1</volume><issue>2</issue><spage>1</spage><epage>23</epage><pages>1-23</pages><artnum>143</artnum><issn>2836-6573</issn><eissn>2836-6573</eissn><abstract>Graph Neural Networks (GNNs) have emerged as powerful tools to capture structural information from graph-structured data, achieving state-of-the-art performance on applications such as recommendation, knowledge graph, and search. Graphs in these domains typically contain hundreds of millions of nodes and billions of edges. However, previous GNN systems demonstrate poor scalability because large and interleaved computation dependencies in GNN training cause significant overhead in current parallelization methods. We present G3, a distributed system that can efficiently train GNNs over billion-edge graphs at scale. G3 introduces GNN hybrid parallelism which synthesizes three dimensions of parallelism to scale out GNN training by sharing intermediate results peer-to-peer in fine granularity, eliminating layer-wise barriers for global collective communication or neighbor replications as seen in prior works. G3 leverages locality-aware iterative partitioning and multi-level pipeline scheduling to exploit acceleration opportunities by distributing balanced workload among workers and overlapping computation with communication in both inter-layer and intra-layer training processes. We show via a prototype implementation and comprehensive experiments that G3 can achieve as much as 2.24x speedup in a 16-node cluster, and better final accuracy over prior works.</abstract><cop>New York, NY, USA</cop><pub>ACM</pub><doi>10.1145/3589288</doi><tpages>23</tpages><orcidid>https://orcid.org/0000-0003-0501-5968</orcidid><orcidid>https://orcid.org/0000-0002-9502-7622</orcidid><orcidid>https://orcid.org/0000-0001-8741-5847</orcidid><orcidid>https://orcid.org/0000-0001-6503-5309</orcidid><orcidid>https://orcid.org/0000-0003-2587-6028</orcidid><orcidid>https://orcid.org/0000-0002-8380-1879</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 2836-6573
ispartof Proceedings of the ACM on management of data, 2023-06, Vol.1 (2), p.1-23, Article 143
issn 2836-6573
2836-6573
language eng
recordid cdi_crossref_primary_10_1145_3589288
source ACM Digital Library
subjects Computing methodologies
Data management systems
Distributed computing methodologies
Information systems
title Scalable and Efficient Full-Graph GNN Training for Large Graphs
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T20%3A46%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Scalable%20and%20Efficient%20Full-Graph%20GNN%20Training%20for%20Large%20Graphs&rft.jtitle=Proceedings%20of%20the%20ACM%20on%20management%20of%20data&rft.au=Wan,%20Xinchen&rft.date=2023-06-20&rft.volume=1&rft.issue=2&rft.spage=1&rft.epage=23&rft.pages=1-23&rft.artnum=143&rft.issn=2836-6573&rft.eissn=2836-6573&rft_id=info:doi/10.1145/3589288&rft_dat=%3Cacm_cross%3E3589288%3C/acm_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-a898-4a910b2980d5348f151f5d549e10c40394d5843e7fb0e397e1925916d9ed3ce03%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true