Loading…

Using Supercomputer to Speed up Neural Network Training

Recent works in deep learning have shown that large models can dramatically improve performance. In this paper, we accelerated the deep network training using many GPUs. We have developed a framework based on Caffe called Caffe-HPC that can utilize computing clusters with multiple GPUs to train larg...

Full description

Saved in:

Bibliographic Details
Main Authors:	Yue Yu, Jinrong Jiang, Xuebin Chi
Format:	Conference Proceeding
Language:	eng ; jpn
Subjects:	Acceleration asynchronous stochastic gradient descent Computational modeling deep learning Graphics processing units Machine learning neural network Neural networks parallel computation parameter server Servers supercomputing Training
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Recent works in deep learning have shown that large models can dramatically improve performance. In this paper, we accelerated the deep network training using many GPUs. We have developed a framework based on Caffe called Caffe-HPC that can utilize computing clusters with multiple GPUs to train large models. Caffe[6] provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models. And Caffe-HPC retains all the features of the original Caffe, the model trained on original Caffe can be continue to trained on Caffe-HPC. It provides a convenient solution for people who are using Caffe and want to speed up the training. Using an Asynchronous Stochastic Gradient Descent optimizer, We made a good acceleration on training a CNN model on ILSVRC[5] 2012 dataset. And we have compared the convergence of different SGD algorithms. We believe our work will makes it possible to train larger networks on larger training sets in a reasonable amount of time.
ISSN:	1521-9097 2690-5965
DOI:	10.1109/ICPADS.2016.0126