Loading…

NeutronOrch: Rethinking Sample-Based GNN Training under CPU-GPU Heterogeneous Environments

Graph Neural Networks (GNNs) have shown exceptional performance across a wide range of applications. Current frameworks leverage CPU-GPU heterogeneous environments for GNN model training, incorporating mini-batch and sampling techniques to mitigate GPU memory constraints. In such settings, sample-ba...

Full description

Saved in:

Bibliographic Details
Published in:	Proceedings of the VLDB Endowment 2024-04, Vol.17 (8), p.1995-2008
Main Authors:	Ai, Xin, Wang, Qiange, Cao, Chunyu, Zhang, Yanfeng, Chen, Chaoyi, Yuan, Hao, Gu, Yu, Yu, Ge
Format:	Article
Language:	English
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Graph Neural Networks (GNNs) have shown exceptional performance across a wide range of applications. Current frameworks leverage CPU-GPU heterogeneous environments for GNN model training, incorporating mini-batch and sampling techniques to mitigate GPU memory constraints. In such settings, sample-based GNN training can be divided into three phases: sampling, gathering, and training. Existing GNN systems deploy various task orchestration methods to execute each phase on either the CPU or GPU. However, through comprehensive experimentation and analysis, we observe that these task orchestration approaches do not optimally exploit the available heterogeneous resources, hindered by either inefficient CPU processing or GPU resource bottlenecks. In this paper, we propose NeutronOrch, a system for sample-based GNN training that ensures balanced utilization of the CPU and GPU. NeutronOrch decouples the training process by layer and pushes down the training task of the bottom layer to the CPU. This significantly reduces the computational load and memory footprint of GPU training. To avoid inefficient CPU processing, NeutronOrch only offloads the training of frequently accessed vertices to the CPU and lets GPU reuse their embeddings with bounded staleness. Furthermore, NeutronOrch provides a fine-grained pipeline design for the layer-based task orchestrating method. The experimental results show that compared with the state-of-the-art GNN systems, NeutronOrch can achieve up to 11.51× performance speedup.
ISSN:	2150-8097 2150-8097
DOI:	10.14778/3659437.3659453