Loading…

Bactran: A Hardware Batch Normalization Implementation for CNN Training Engine

In recent years, convolutional neural networks (CNNs) have been widely used. However, their ever-increasing amount of parameters makes it challenging to train them with the GPUs, which is time and energy expensive. This has prompted researchers to turn their attention to training on more energy-effi...

Full description

Saved in:
Bibliographic Details
Published in:IEEE embedded systems letters 2021-03, Vol.13 (1), p.29-32
Main Authors: Zhijie, Yang, Lei, Wang, Li, Luo, Shiming, Li, Shasha, Guo, Shuquan, Wang
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In recent years, convolutional neural networks (CNNs) have been widely used. However, their ever-increasing amount of parameters makes it challenging to train them with the GPUs, which is time and energy expensive. This has prompted researchers to turn their attention to training on more energy-efficient hardware. batch normalization (BN) layer has been widely used in various state-of-the-art CNNs for it is an indispensable layer in the acceleration of CNN training. As the amount of computation of the convolutional layer declines, its importance continues to increase. However, the traditional CNN training accelerators do not pay attention to the efficient hardware implementation of the BN layer. In this letter, we design an efficient CNN training architecture by using the systolic array. The processing element of the systolic array can support the BN functions both in the training process and the inference process. The BN function implemented is an improved, hardware-friendly BN algorithm, range batch normalization (RBN). The experimental results show that the implementation of RBN saves 10% hardware resources, reduces the power by 10.1%, and the delay by 4.6% on average. We implement the accelerator on the field programmable gate array VU440, and the power consumption of the its core computing engine is 8.9 W.
ISSN:1943-0663
1943-0671
DOI:10.1109/LES.2020.2975055