Loading…
Scalable and Efficient Full-Graph GNN Training for Large Graphs
Graph Neural Networks (GNNs) have emerged as powerful tools to capture structural information from graph-structured data, achieving state-of-the-art performance on applications such as recommendation, knowledge graph, and search. Graphs in these domains typically contain hundreds of millions of node...
Saved in:
Published in: | Proceedings of the ACM on management of data 2023-06, Vol.1 (2), p.1-23, Article 143 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Graph Neural Networks (GNNs) have emerged as powerful tools to capture structural information from graph-structured data, achieving state-of-the-art performance on applications such as recommendation, knowledge graph, and search. Graphs in these domains typically contain hundreds of millions of nodes and billions of edges. However, previous GNN systems demonstrate poor scalability because large and interleaved computation dependencies in GNN training cause significant overhead in current parallelization methods. We present G3, a distributed system that can efficiently train GNNs over billion-edge graphs at scale. G3 introduces GNN hybrid parallelism which synthesizes three dimensions of parallelism to scale out GNN training by sharing intermediate results peer-to-peer in fine granularity, eliminating layer-wise barriers for global collective communication or neighbor replications as seen in prior works. G3 leverages locality-aware iterative partitioning and multi-level pipeline scheduling to exploit acceleration opportunities by distributing balanced workload among workers and overlapping computation with communication in both inter-layer and intra-layer training processes. We show via a prototype implementation and comprehensive experiments that G3 can achieve as much as 2.24x speedup in a 16-node cluster, and better final accuracy over prior works. |
---|---|
ISSN: | 2836-6573 2836-6573 |
DOI: | 10.1145/3589288 |