Loading…

A Hierarchical and Reconfigurable Process Element Design for Quantized Neural Networks

Convolution neural networks are very popular for various applications. However, data size and accuracy are the two major concerns to perform efficient and effective computations. In conventional CNN models, 32bits data are frequently used to maintain high accuracy. However, performing a bunch of 32b...

Full description

Saved in:
Bibliographic Details
Main Authors: Chen, Yu-Guang, Hsu, Chi-Wei, Chiang, Hung-Yi, Hsieh, Tsung-Han, Jou, Jing-Yang
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Convolution neural networks are very popular for various applications. However, data size and accuracy are the two major concerns to perform efficient and effective computations. In conventional CNN models, 32bits data are frequently used to maintain high accuracy. However, performing a bunch of 32bits multiply-and-accumulate (MAC) operations causes significant computing efforts as well as power consumptions. Therefore, recently researchers develop various methods to reduce data size and speed up calculations. Quantization is one of the techniques which reduces the number of the bits of data and the computational complexity at the cost of accuracy loss. To provide better computation effort and accuracy trade-off, different bitwidth may be applied to different layers within a CNN model. Therefore, a flexible Processing Element (PE) which can support operations of different bitwidth is in demand. In this paper, we propose a hierarchal PE structure that can support 8bits x 8bits, 8bits x 4bits, 4bits x 4bits and 2bits x 2bits operations. The structure applies the concept of reconfiguration and can avoid the redundant hardware for reconfiguration. Moreover, the concept of pipelining is also adopted in our design to provide better efficiency. The experimental results show that in 2bits x 2bits PE, we can achieve area reduction by 57% and 68% compared to a Precision-Scalable accelerator and Bit Fusion, respectively.
ISSN:2164-1706
DOI:10.1109/SOCC52499.2021.9739487