Loading…
Scalable and Efficient Full-Graph GNN Training for Large Graphs
Graph Neural Networks (GNNs) have emerged as powerful tools to capture structural information from graph-structured data, achieving state-of-the-art performance on applications such as recommendation, knowledge graph, and search. Graphs in these domains typically contain hundreds of millions of node...
Saved in:
Published in: | Proceedings of the ACM on management of data 2023-06, Vol.1 (2), p.1-23, Article 143 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-a898-4a910b2980d5348f151f5d549e10c40394d5843e7fb0e397e1925916d9ed3ce03 |
---|---|
cites | cdi_FETCH-LOGICAL-a898-4a910b2980d5348f151f5d549e10c40394d5843e7fb0e397e1925916d9ed3ce03 |
container_end_page | 23 |
container_issue | 2 |
container_start_page | 1 |
container_title | Proceedings of the ACM on management of data |
container_volume | 1 |
creator | Wan, Xinchen Xu, Kaiqiang Liao, Xudong Jin, Yilun Chen, Kai Jin, Xin |
description | Graph Neural Networks (GNNs) have emerged as powerful tools to capture structural information from graph-structured data, achieving state-of-the-art performance on applications such as recommendation, knowledge graph, and search. Graphs in these domains typically contain hundreds of millions of nodes and billions of edges. However, previous GNN systems demonstrate poor scalability because large and interleaved computation dependencies in GNN training cause significant overhead in current parallelization methods. We present G3, a distributed system that can efficiently train GNNs over billion-edge graphs at scale. G3 introduces GNN hybrid parallelism which synthesizes three dimensions of parallelism to scale out GNN training by sharing intermediate results peer-to-peer in fine granularity, eliminating layer-wise barriers for global collective communication or neighbor replications as seen in prior works. G3 leverages locality-aware iterative partitioning and multi-level pipeline scheduling to exploit acceleration opportunities by distributing balanced workload among workers and overlapping computation with communication in both inter-layer and intra-layer training processes. We show via a prototype implementation and comprehensive experiments that G3 can achieve as much as 2.24x speedup in a 16-node cluster, and better final accuracy over prior works. |
doi_str_mv | 10.1145/3589288 |
format | article |
fullrecord | <record><control><sourceid>acm_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3589288</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3589288</sourcerecordid><originalsourceid>FETCH-LOGICAL-a898-4a910b2980d5348f151f5d549e10c40394d5843e7fb0e397e1925916d9ed3ce03</originalsourceid><addsrcrecordid>eNpNj89LwzAYhoMoOObw7ik3T9V8TdLmO4mMrQplO6z3kiZfZqXrRqIH_3t_bIqn94X34YWHsWsQdwBK30ttMDfmjE1yI4us0KU8_9cv2SylVyFEjoUELCbsYePsYLuBuB09X4TQu57GN758H4asivbwwqvVijfR9mM_bnnYR17buCX-M6YrdhHskGh2yilrlotm_pTV6-p5_lhn1qDJlEUQXY5GeC2VCaAhaK8VEginhETltVGSytAJklgSYK4RCo_kpSMhp-z2eOviPqVIoT3EfmfjRwui_TZvT-Zf5M2RtG73B_2On2ePUBA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Scalable and Efficient Full-Graph GNN Training for Large Graphs</title><source>ACM Digital Library</source><creator>Wan, Xinchen ; Xu, Kaiqiang ; Liao, Xudong ; Jin, Yilun ; Chen, Kai ; Jin, Xin</creator><creatorcontrib>Wan, Xinchen ; Xu, Kaiqiang ; Liao, Xudong ; Jin, Yilun ; Chen, Kai ; Jin, Xin</creatorcontrib><description>Graph Neural Networks (GNNs) have emerged as powerful tools to capture structural information from graph-structured data, achieving state-of-the-art performance on applications such as recommendation, knowledge graph, and search. Graphs in these domains typically contain hundreds of millions of nodes and billions of edges. However, previous GNN systems demonstrate poor scalability because large and interleaved computation dependencies in GNN training cause significant overhead in current parallelization methods. We present G3, a distributed system that can efficiently train GNNs over billion-edge graphs at scale. G3 introduces GNN hybrid parallelism which synthesizes three dimensions of parallelism to scale out GNN training by sharing intermediate results peer-to-peer in fine granularity, eliminating layer-wise barriers for global collective communication or neighbor replications as seen in prior works. G3 leverages locality-aware iterative partitioning and multi-level pipeline scheduling to exploit acceleration opportunities by distributing balanced workload among workers and overlapping computation with communication in both inter-layer and intra-layer training processes. We show via a prototype implementation and comprehensive experiments that G3 can achieve as much as 2.24x speedup in a 16-node cluster, and better final accuracy over prior works.</description><identifier>ISSN: 2836-6573</identifier><identifier>EISSN: 2836-6573</identifier><identifier>DOI: 10.1145/3589288</identifier><language>eng</language><publisher>New York, NY, USA: ACM</publisher><subject>Computing methodologies ; Data management systems ; Distributed computing methodologies ; Information systems</subject><ispartof>Proceedings of the ACM on management of data, 2023-06, Vol.1 (2), p.1-23, Article 143</ispartof><rights>ACM</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a898-4a910b2980d5348f151f5d549e10c40394d5843e7fb0e397e1925916d9ed3ce03</citedby><cites>FETCH-LOGICAL-a898-4a910b2980d5348f151f5d549e10c40394d5843e7fb0e397e1925916d9ed3ce03</cites><orcidid>0000-0003-0501-5968 ; 0000-0002-9502-7622 ; 0000-0001-8741-5847 ; 0000-0001-6503-5309 ; 0000-0003-2587-6028 ; 0000-0002-8380-1879</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://dl.acm.org/doi/pdf/10.1145/3589288$$EPDF$$P50$$Gacm$$H</linktopdf><link.rule.ids>314,780,784,2282,27924,27925,40196,76228</link.rule.ids></links><search><creatorcontrib>Wan, Xinchen</creatorcontrib><creatorcontrib>Xu, Kaiqiang</creatorcontrib><creatorcontrib>Liao, Xudong</creatorcontrib><creatorcontrib>Jin, Yilun</creatorcontrib><creatorcontrib>Chen, Kai</creatorcontrib><creatorcontrib>Jin, Xin</creatorcontrib><title>Scalable and Efficient Full-Graph GNN Training for Large Graphs</title><title>Proceedings of the ACM on management of data</title><addtitle>ACM PACMMOD</addtitle><description>Graph Neural Networks (GNNs) have emerged as powerful tools to capture structural information from graph-structured data, achieving state-of-the-art performance on applications such as recommendation, knowledge graph, and search. Graphs in these domains typically contain hundreds of millions of nodes and billions of edges. However, previous GNN systems demonstrate poor scalability because large and interleaved computation dependencies in GNN training cause significant overhead in current parallelization methods. We present G3, a distributed system that can efficiently train GNNs over billion-edge graphs at scale. G3 introduces GNN hybrid parallelism which synthesizes three dimensions of parallelism to scale out GNN training by sharing intermediate results peer-to-peer in fine granularity, eliminating layer-wise barriers for global collective communication or neighbor replications as seen in prior works. G3 leverages locality-aware iterative partitioning and multi-level pipeline scheduling to exploit acceleration opportunities by distributing balanced workload among workers and overlapping computation with communication in both inter-layer and intra-layer training processes. We show via a prototype implementation and comprehensive experiments that G3 can achieve as much as 2.24x speedup in a 16-node cluster, and better final accuracy over prior works.</description><subject>Computing methodologies</subject><subject>Data management systems</subject><subject>Distributed computing methodologies</subject><subject>Information systems</subject><issn>2836-6573</issn><issn>2836-6573</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNpNj89LwzAYhoMoOObw7ik3T9V8TdLmO4mMrQplO6z3kiZfZqXrRqIH_3t_bIqn94X34YWHsWsQdwBK30ttMDfmjE1yI4us0KU8_9cv2SylVyFEjoUELCbsYePsYLuBuB09X4TQu57GN758H4asivbwwqvVijfR9mM_bnnYR17buCX-M6YrdhHskGh2yilrlotm_pTV6-p5_lhn1qDJlEUQXY5GeC2VCaAhaK8VEginhETltVGSytAJklgSYK4RCo_kpSMhp-z2eOviPqVIoT3EfmfjRwui_TZvT-Zf5M2RtG73B_2On2ePUBA</recordid><startdate>20230620</startdate><enddate>20230620</enddate><creator>Wan, Xinchen</creator><creator>Xu, Kaiqiang</creator><creator>Liao, Xudong</creator><creator>Jin, Yilun</creator><creator>Chen, Kai</creator><creator>Jin, Xin</creator><general>ACM</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0003-0501-5968</orcidid><orcidid>https://orcid.org/0000-0002-9502-7622</orcidid><orcidid>https://orcid.org/0000-0001-8741-5847</orcidid><orcidid>https://orcid.org/0000-0001-6503-5309</orcidid><orcidid>https://orcid.org/0000-0003-2587-6028</orcidid><orcidid>https://orcid.org/0000-0002-8380-1879</orcidid></search><sort><creationdate>20230620</creationdate><title>Scalable and Efficient Full-Graph GNN Training for Large Graphs</title><author>Wan, Xinchen ; Xu, Kaiqiang ; Liao, Xudong ; Jin, Yilun ; Chen, Kai ; Jin, Xin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a898-4a910b2980d5348f151f5d549e10c40394d5843e7fb0e397e1925916d9ed3ce03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computing methodologies</topic><topic>Data management systems</topic><topic>Distributed computing methodologies</topic><topic>Information systems</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wan, Xinchen</creatorcontrib><creatorcontrib>Xu, Kaiqiang</creatorcontrib><creatorcontrib>Liao, Xudong</creatorcontrib><creatorcontrib>Jin, Yilun</creatorcontrib><creatorcontrib>Chen, Kai</creatorcontrib><creatorcontrib>Jin, Xin</creatorcontrib><collection>CrossRef</collection><jtitle>Proceedings of the ACM on management of data</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wan, Xinchen</au><au>Xu, Kaiqiang</au><au>Liao, Xudong</au><au>Jin, Yilun</au><au>Chen, Kai</au><au>Jin, Xin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Scalable and Efficient Full-Graph GNN Training for Large Graphs</atitle><jtitle>Proceedings of the ACM on management of data</jtitle><stitle>ACM PACMMOD</stitle><date>2023-06-20</date><risdate>2023</risdate><volume>1</volume><issue>2</issue><spage>1</spage><epage>23</epage><pages>1-23</pages><artnum>143</artnum><issn>2836-6573</issn><eissn>2836-6573</eissn><abstract>Graph Neural Networks (GNNs) have emerged as powerful tools to capture structural information from graph-structured data, achieving state-of-the-art performance on applications such as recommendation, knowledge graph, and search. Graphs in these domains typically contain hundreds of millions of nodes and billions of edges. However, previous GNN systems demonstrate poor scalability because large and interleaved computation dependencies in GNN training cause significant overhead in current parallelization methods. We present G3, a distributed system that can efficiently train GNNs over billion-edge graphs at scale. G3 introduces GNN hybrid parallelism which synthesizes three dimensions of parallelism to scale out GNN training by sharing intermediate results peer-to-peer in fine granularity, eliminating layer-wise barriers for global collective communication or neighbor replications as seen in prior works. G3 leverages locality-aware iterative partitioning and multi-level pipeline scheduling to exploit acceleration opportunities by distributing balanced workload among workers and overlapping computation with communication in both inter-layer and intra-layer training processes. We show via a prototype implementation and comprehensive experiments that G3 can achieve as much as 2.24x speedup in a 16-node cluster, and better final accuracy over prior works.</abstract><cop>New York, NY, USA</cop><pub>ACM</pub><doi>10.1145/3589288</doi><tpages>23</tpages><orcidid>https://orcid.org/0000-0003-0501-5968</orcidid><orcidid>https://orcid.org/0000-0002-9502-7622</orcidid><orcidid>https://orcid.org/0000-0001-8741-5847</orcidid><orcidid>https://orcid.org/0000-0001-6503-5309</orcidid><orcidid>https://orcid.org/0000-0003-2587-6028</orcidid><orcidid>https://orcid.org/0000-0002-8380-1879</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2836-6573 |
ispartof | Proceedings of the ACM on management of data, 2023-06, Vol.1 (2), p.1-23, Article 143 |
issn | 2836-6573 2836-6573 |
language | eng |
recordid | cdi_crossref_primary_10_1145_3589288 |
source | ACM Digital Library |
subjects | Computing methodologies Data management systems Distributed computing methodologies Information systems |
title | Scalable and Efficient Full-Graph GNN Training for Large Graphs |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T20%3A46%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Scalable%20and%20Efficient%20Full-Graph%20GNN%20Training%20for%20Large%20Graphs&rft.jtitle=Proceedings%20of%20the%20ACM%20on%20management%20of%20data&rft.au=Wan,%20Xinchen&rft.date=2023-06-20&rft.volume=1&rft.issue=2&rft.spage=1&rft.epage=23&rft.pages=1-23&rft.artnum=143&rft.issn=2836-6573&rft.eissn=2836-6573&rft_id=info:doi/10.1145/3589288&rft_dat=%3Cacm_cross%3E3589288%3C/acm_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-a898-4a910b2980d5348f151f5d549e10c40394d5843e7fb0e397e1925916d9ed3ce03%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |