Loading…

A Flexible Yet Efficient DNN Pruning Approach for Crossbar-based Processing-In-Memory Architectures

Pruning deep neural networks (DNNs) can reduce the model size and thus save hardware resources of a resistive-random-access-memory (ReRAM) based DNN accelerator. For the tightly-coupled crossbar structure, existing ReRAM-based pruning techniques prune the weights of a DNN in a structured manner, the...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on computer-aided design of integrated circuits and systems 2022-11, Vol.41 (11), p.1-1
Main Authors:	Zheng, Long, Liu, Haifeng, Huang, Yu, Chen, Dan, Liu, Chaoqiang, He, Haiheng, Liao, Xiaofei, Jin, Hai, Xue, Jingling
Format:	Article
Language:	English
Subjects:	Accuracy Algorithms Artificial neural networks Computational modeling Computer architecture DNN accelerator Efficiency Hardware Kernel Microprocessors processing in memory Random access memory Resistive random access memory Segments Shape System-on-chip weight pruning
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Pruning deep neural networks (DNNs) can reduce the model size and thus save hardware resources of a resistive-random-access-memory (ReRAM) based DNN accelerator. For the tightly-coupled crossbar structure, existing ReRAM-based pruning techniques prune the weights of a DNN in a structured manner, thereby attaining low pruning ratios. This paper presents a novel pruning technique, for pruning the weights of a DNN flexibly on crossbar architectures in order to maximize the pruning ratio achieved while preserving crossbar efficiency. We observe that different filters of a weight matrix share a large number of matrix sub-columns (in the same rows), called segments, that can be pruned by using the same segment shape in the sense that the weights at the same column position of these segments are either simultaneously accuracy-sensitive (and should thus be reserved) or simultaneously accuracy-insensitive (and can thus be pruned). Due to the bit-line exchangeability in the crossbar, segments with the same pruning shape can be assembled together into the same crossbar to ensure crossbar execution efficiency. We propose a projection-based shape voting algorithm to select suitable segment shapes to drive the weight pruning process. Accordingly, we also introduce a low-overhead data path that can be easily integrated into any existing ReRAM-based DNN accelerator, achieving a high pruning ratio and a high execution efficiency. Our evaluation shows that outperforms the state-of-the-art, Hybrid-P and FORMAS, by up to 14.6× and 3.6× in pruning ratio, 13.9× and 3.4× in inference speedup, and 12.5× and 3.1× in energy reduction, respectively, while achieving an even higher accuracy at the cost of less than 0.27% extra hardware area overhead.
ISSN:	0278-0070 1937-4151
DOI:	10.1109/TCAD.2022.3197510