Loading…

GCNTrain+: A Versatile and Efficient Accelerator for Graph Convolutional Neural Network Training

Recently, graph convolutional networks (GCNs) have gained wide attention due to their ability to capture node relationships in graphs. One problem appears when full-batch GCN is trained on large graph datasets, where the computational and memory requirements are unacceptable. To address this issue,...

Full description

Saved in:

Bibliographic Details
Published in:	ACM transactions on architecture and code optimization 2024-11
Main Authors:	Song, Zhuoran, Long, Jiabei, Jiang, Li, Jing, Naifeng, Liang, Xiaoyao
Format:	Article
Language:	English
Subjects:	Computer systems organization Neural networks
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Recently, graph convolutional networks (GCNs) have gained wide attention due to their ability to capture node relationships in graphs. One problem appears when full-batch GCN is trained on large graph datasets, where the computational and memory requirements are unacceptable. To address this issue, mini-batch GCN training is introduced to improve the scalability of GCN training for large datasets by sampling and training only a subset of the graph in each batch. Although several acceleration techniques have been designed for boosting the efficiency of full-batch GCN, they lack attention to mini-batch GCN, which differs from full-batch GCN in terms of the sampled dynamic graph structures. Based on our previous work GCNTrain [28], which was originally excogitated for accelerating full-batch GCN training, we devise GCNTrain+—a universal accelerator to tackle the performance bottlenecks associated with both full-batch and mini-batch GCN training. GCNTrain+ is equipped with two engines to optimize computation and memory access in GCN training, respectively. To reduce the computation overhead, we propose to dynamically reconfigure the computation order based on the varying data dimensions involved in each training batch. Moreover, we build a unified computation engine to perform the sparse-dense matrix multiplications (SpDM) and sparse-sparse matrix multiplications (SpSpM) discovered in GCN training uniformly. To alleviate the memory burden, we devise a two-phased dynamic clustering mechanism to capture data locality as well as customized hardware to reduce the clustering overhead. We evaluate GCNTrain+ on seven datasets, and the result shows that GCNTrain+ achieves 136.0 ×, 52.6 ×, 2.2 ×, and 1.5 × speedup over CPU, GPU, GCNAX, and GCNTrain in full-batch GCN training. Additionally, GCNTrain+ outperforms them with speedups of 131.6 ×, 67.1 ×, 4.4 ×, and 1.5 × in mini-batch GCN training.
ISSN:	1544-3566 1544-3973
DOI:	10.1145/3705317