Loading…

Scalable and Efficient Full-Graph GNN Training for Large Graphs

Graph Neural Networks (GNNs) have emerged as powerful tools to capture structural information from graph-structured data, achieving state-of-the-art performance on applications such as recommendation, knowledge graph, and search. Graphs in these domains typically contain hundreds of millions of node...

Full description

Saved in:

Bibliographic Details
Published in:	Proceedings of the ACM on management of data 2023-06, Vol.1 (2), p.1-23, Article 143
Main Authors:	Wan, Xinchen, Xu, Kaiqiang, Liao, Xudong, Jin, Yilun, Chen, Kai, Jin, Xin
Format:	Article
Language:	English
Subjects:	Computing methodologies Data management systems Distributed computing methodologies Information systems
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Graph Neural Networks (GNNs) have emerged as powerful tools to capture structural information from graph-structured data, achieving state-of-the-art performance on applications such as recommendation, knowledge graph, and search. Graphs in these domains typically contain hundreds of millions of nodes and billions of edges. However, previous GNN systems demonstrate poor scalability because large and interleaved computation dependencies in GNN training cause significant overhead in current parallelization methods. We present G3, a distributed system that can efficiently train GNNs over billion-edge graphs at scale. G3 introduces GNN hybrid parallelism which synthesizes three dimensions of parallelism to scale out GNN training by sharing intermediate results peer-to-peer in fine granularity, eliminating layer-wise barriers for global collective communication or neighbor replications as seen in prior works. G3 leverages locality-aware iterative partitioning and multi-level pipeline scheduling to exploit acceleration opportunities by distributing balanced workload among workers and overlapping computation with communication in both inter-layer and intra-layer training processes. We show via a prototype implementation and comprehensive experiments that G3 can achieve as much as 2.24x speedup in a 16-node cluster, and better final accuracy over prior works.
ISSN:	2836-6573 2836-6573
DOI:	10.1145/3589288