Loading…

Scalable and Efficient Full-Graph GNN Training for Large Graphs

Graph Neural Networks (GNNs) have emerged as powerful tools to capture structural information from graph-structured data, achieving state-of-the-art performance on applications such as recommendation, knowledge graph, and search. Graphs in these domains typically contain hundreds of millions of node...

Full description

Saved in:

Bibliographic Details
Published in:	Proceedings of the ACM on management of data 2023-06, Vol.1 (2), p.1-23, Article 143
Main Authors:	Wan, Xinchen, Xu, Kaiqiang, Liao, Xudong, Jin, Yilun, Chen, Kai, Jin, Xin
Format:	Article
Language:	English
Subjects:	Computing methodologies Data management systems Distributed computing methodologies Information systems
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-a898-4a910b2980d5348f151f5d549e10c40394d5843e7fb0e397e1925916d9ed3ce03
cites	cdi_FETCH-LOGICAL-a898-4a910b2980d5348f151f5d549e10c40394d5843e7fb0e397e1925916d9ed3ce03
container_end_page	23
container_issue	2
container_start_page	1
container_title	Proceedings of the ACM on management of data
container_volume	1
creator	Wan, Xinchen Xu, Kaiqiang Liao, Xudong Jin, Yilun Chen, Kai Jin, Xin
description	Graph Neural Networks (GNNs) have emerged as powerful tools to capture structural information from graph-structured data, achieving state-of-the-art performance on applications such as recommendation, knowledge graph, and search. Graphs in these domains typically contain hundreds of millions of nodes and billions of edges. However, previous GNN systems demonstrate poor scalability because large and interleaved computation dependencies in GNN training cause significant overhead in current parallelization methods. We present G3, a distributed system that can efficiently train GNNs over billion-edge graphs at scale. G3 introduces GNN hybrid parallelism which synthesizes three dimensions of parallelism to scale out GNN training by sharing intermediate results peer-to-peer in fine granularity, eliminating layer-wise barriers for global collective communication or neighbor replications as seen in prior works. G3 leverages locality-aware iterative partitioning and multi-level pipeline scheduling to exploit acceleration opportunities by distributing balanced workload among workers and overlapping computation with communication in both inter-layer and intra-layer training processes. We show via a prototype implementation and comprehensive experiments that G3 can achieve as much as 2.24x speedup in a 16-node cluster, and better final accuracy over prior works.
doi_str_mv	10.1145/3589288
format	article
fullrecord	<record><control><sourceid>acm_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3589288</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3589288</sourcerecordid><originalsourceid>FETCH-LOGICAL-a898-4a910b2980d5348f151f5d549e10c40394d5843e7fb0e397e1925916d9ed3ce03</originalsourceid><addsrcrecordid>eNpNj89LwzAYhoMoOObw7ik3T9V8TdLmO4mMrQplO6z3kiZfZqXrRqIH_3t_bIqn94X34YWHsWsQdwBK30ttMDfmjE1yI4us0KU8_9cv2SylVyFEjoUELCbsYePsYLuBuB09X4TQu57GN758H4asivbwwqvVijfR9mM_bnnYR17buCX-M6YrdhHskGh2yilrlotm_pTV6-p5_lhn1qDJlEUQXY5GeC2VCaAhaK8VEginhETltVGSytAJklgSYK4RCo_kpSMhp-z2eOviPqVIoT3EfmfjRwui_TZvT-Zf5M2RtG73B_2On2ePUBA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Scalable and Efficient Full-Graph GNN Training for Large Graphs</title><source>ACM Digital Library</source><creator>Wan, Xinchen ; Xu, Kaiqiang ; Liao, Xudong ; Jin, Yilun ; Chen, Kai ; Jin, Xin</creator><creatorcontrib>Wan, Xinchen ; Xu, Kaiqiang ; Liao, Xudong ; Jin, Yilun ; Chen, Kai ; Jin, Xin</creatorcontrib><description>Graph Neural Networks (GNNs) have emerged as powerful tools to capture structural information from graph-structured data, achieving state-of-the-art performance on applications such as recommendation, knowledge graph, and search. Graphs in these domains typically contain hundreds of millions of nodes and billions of edges. However, previous GNN systems demonstrate poor scalability because large and interleaved computation dependencies in GNN training cause significant overhead in current parallelization methods. We present G3, a distributed system that can efficiently train GNNs over billion-edge graphs at scale. G3 introduces GNN hybrid parallelism which synthesizes three dimensions of parallelism to scale out GNN training by sharing intermediate results peer-to-peer in fine granularity, eliminating layer-wise barriers for global collective communication or neighbor replications as seen in prior works. G3 leverages locality-aware iterative partitioning and multi-level pipeline scheduling to exploit acceleration opportunities by distributing balanced workload among workers and overlapping computation with communication in both inter-layer and intra-layer training processes. We show via a prototype implementation and comprehensive experiments that G3 can achieve as much as 2.24x speedup in a 16-node cluster, and better final accuracy over prior works.</description><identifier>ISSN: 2836-6573</identifier><identifier>EISSN: 2836-6573</identifier><identifier>DOI: 10.1145/3589288</identifier><language>eng</language><publisher>New York, NY, USA: ACM</publisher><subject>Computing methodologies ; Data management systems ; Distributed computing methodologies ; Information systems</subject><ispartof>Proceedings of the ACM on management of data, 2023-06, Vol.1 (2), p.1-23, Article 143</ispartof><rights>ACM</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a898-4a910b2980d5348f151f5d549e10c40394d5843e7fb0e397e1925916d9ed3ce03</citedby><cites>FETCH-LOGICAL-a898-4a910b2980d5348f151f5d549e10c40394d5843e7fb0e397e1925916d9ed3ce03</cites><orcidid>0000-0003-0501-5968 ; 0000-0002-9502-7622 ; 0000-0001-8741-5847 ; 0000-0001-6503-5309 ; 0000-0003-2587-6028 ; 0000-0002-8380-1879</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://dl.acm.org/doi/pdf/10.1145/3589288$$EPDF$$P50$$Gacm$$H</linktopdf><link.rule.ids>314,780,784,2282,27924,27925,40196,76228</link.rule.ids></links><search><creatorcontrib>Wan, Xinchen</creatorcontrib><creatorcontrib>Xu, Kaiqiang</creatorcontrib><creatorcontrib>Liao, Xudong</creatorcontrib><creatorcontrib>Jin, Yilun</creatorcontrib><creatorcontrib>Chen, Kai</creatorcontrib><creatorcontrib>Jin, Xin</creatorcontrib><title>Scalable and Efficient Full-Graph GNN Training for Large Graphs</title><title>Proceedings of the ACM on management of data</title><addtitle>ACM PACMMOD</addtitle><description>Graph Neural Networks (GNNs) have emerged as powerful tools to capture structural information from graph-structured data, achieving state-of-the-art performance on applications such as recommendation, knowledge graph, and search. Graphs in these domains typically contain hundreds of millions of nodes and billions of edges. However, previous GNN systems demonstrate poor scalability because large and interleaved computation dependencies in GNN training cause significant overhead in current parallelization methods. We present G3, a distributed system that can efficiently train GNNs over billion-edge graphs at scale. G3 introduces GNN hybrid parallelism which synthesizes three dimensions of parallelism to scale out GNN training by sharing intermediate results peer-to-peer in fine granularity, eliminating layer-wise barriers for global collective communication or neighbor replications as seen in prior works. G3 leverages locality-aware iterative partitioning and multi-level pipeline scheduling to exploit acceleration opportunities by distributing balanced workload among workers and overlapping computation with communication in both inter-layer and intra-layer training processes. We show via a prototype implementation and comprehensive experiments that G3 can achieve as much as 2.24x speedup in a 16-node cluster, and better final accuracy over prior works.</description><subject>Computing methodologies</subject><subject>Data management systems</subject><subject>Distributed computing methodologies</subject><subject>Information systems</subject><issn>2836-6573</issn><issn>2836-6573</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNpNj89LwzAYhoMoOObw7ik3T9V8TdLmO4mMrQplO6z3kiZfZqXrRqIH_3t_bIqn94X34YWHsWsQdwBK30ttMDfmjE1yI4us0KU8_9cv2SylVyFEjoUELCbsYePsYLuBuB09X4TQu57GN758H4asivbwwqvVijfR9mM_bnnYR17buCX-M6YrdhHskGh2yilrlotm_pTV6-p5_lhn1qDJlEUQXY5GeC2VCaAhaK8VEginhETltVGSytAJklgSYK4RCo_kpSMhp-z2eOviPqVIoT3EfmfjRwui_TZvT-Zf5M2RtG73B_2On2ePUBA</recordid><startdate>20230620</startdate><enddate>20230620</enddate><creator>Wan, Xinchen</creator><creator>Xu, Kaiqiang</creator><creator>Liao, Xudong</creator><creator>Jin, Yilun</creator><creator>Chen, Kai</creator><creator>Jin, Xin</creator><general>ACM</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0003-0501-5968</orcidid><orcidid>https://orcid.org/0000-0002-9502-7622</orcidid><orcidid>https://orcid.org/0000-0001-8741-5847</orcidid><orcidid>https://orcid.org/0000-0001-6503-5309</orcidid><orcidid>https://orcid.org/0000-0003-2587-6028</orcidid><orcidid>https://orcid.org/0000-0002-8380-1879</orcidid></search><sort><creationdate>20230620</creationdate><title>Scalable and Efficient Full-Graph GNN Training for Large Graphs</title><author>Wan, Xinchen ; Xu, Kaiqiang ; Liao, Xudong ; Jin, Yilun ; Chen, Kai ; Jin, Xin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a898-4a910b2980d5348f151f5d549e10c40394d5843e7fb0e397e1925916d9ed3ce03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computing methodologies</topic><topic>Data management systems</topic><topic>Distributed computing methodologies</topic><topic>Information systems</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wan, Xinchen</creatorcontrib><creatorcontrib>Xu, Kaiqiang</creatorcontrib><creatorcontrib>Liao, Xudong</creatorcontrib><creatorcontrib>Jin, Yilun</creatorcontrib><creatorcontrib>Chen, Kai</creatorcontrib><creatorcontrib>Jin, Xin</creatorcontrib><collection>CrossRef</collection><jtitle>Proceedings of the ACM on management of data</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wan, Xinchen</au><au>Xu, Kaiqiang</au><au>Liao, Xudong</au><au>Jin, Yilun</au><au>Chen, Kai</au><au>Jin, Xin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Scalable and Efficient Full-Graph GNN Training for Large Graphs</atitle><jtitle>Proceedings of the ACM on management of data</jtitle><stitle>ACM PACMMOD</stitle><date>2023-06-20</date><risdate>2023</risdate><volume>1</volume><issue>2</issue><spage>1</spage><epage>23</epage><pages>1-23</pages><artnum>143</artnum><issn>2836-6573</issn><eissn>2836-6573</eissn><abstract>Graph Neural Networks (GNNs) have emerged as powerful tools to capture structural information from graph-structured data, achieving state-of-the-art performance on applications such as recommendation, knowledge graph, and search. Graphs in these domains typically contain hundreds of millions of nodes and billions of edges. However, previous GNN systems demonstrate poor scalability because large and interleaved computation dependencies in GNN training cause significant overhead in current parallelization methods. We present G3, a distributed system that can efficiently train GNNs over billion-edge graphs at scale. G3 introduces GNN hybrid parallelism which synthesizes three dimensions of parallelism to scale out GNN training by sharing intermediate results peer-to-peer in fine granularity, eliminating layer-wise barriers for global collective communication or neighbor replications as seen in prior works. G3 leverages locality-aware iterative partitioning and multi-level pipeline scheduling to exploit acceleration opportunities by distributing balanced workload among workers and overlapping computation with communication in both inter-layer and intra-layer training processes. We show via a prototype implementation and comprehensive experiments that G3 can achieve as much as 2.24x speedup in a 16-node cluster, and better final accuracy over prior works.</abstract><cop>New York, NY, USA</cop><pub>ACM</pub><doi>10.1145/3589288</doi><tpages>23</tpages><orcidid>https://orcid.org/0000-0003-0501-5968</orcidid><orcidid>https://orcid.org/0000-0002-9502-7622</orcidid><orcidid>https://orcid.org/0000-0001-8741-5847</orcidid><orcidid>https://orcid.org/0000-0001-6503-5309</orcidid><orcidid>https://orcid.org/0000-0003-2587-6028</orcidid><orcidid>https://orcid.org/0000-0002-8380-1879</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 2836-6573
ispartof	Proceedings of the ACM on management of data, 2023-06, Vol.1 (2), p.1-23, Article 143
issn	2836-6573 2836-6573
language	eng
recordid	cdi_crossref_primary_10_1145_3589288
source	ACM Digital Library
subjects	Computing methodologies Data management systems Distributed computing methodologies Information systems
title	Scalable and Efficient Full-Graph GNN Training for Large Graphs
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T20%3A46%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Scalable%20and%20Efficient%20Full-Graph%20GNN%20Training%20for%20Large%20Graphs&rft.jtitle=Proceedings%20of%20the%20ACM%20on%20management%20of%20data&rft.au=Wan,%20Xinchen&rft.date=2023-06-20&rft.volume=1&rft.issue=2&rft.spage=1&rft.epage=23&rft.pages=1-23&rft.artnum=143&rft.issn=2836-6573&rft.eissn=2836-6573&rft_id=info:doi/10.1145/3589288&rft_dat=%3Cacm_cross%3E3589288%3C/acm_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-a898-4a910b2980d5348f151f5d549e10c40394d5843e7fb0e397e1925916d9ed3ce03%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true