Loading…
A Flexible Yet Efficient DNN Pruning Approach for Crossbar-based Processing-In-Memory Architectures
Pruning deep neural networks (DNNs) can reduce the model size and thus save hardware resources of a resistive-random-access-memory (ReRAM) based DNN accelerator. For the tightly-coupled crossbar structure, existing ReRAM-based pruning techniques prune the weights of a DNN in a structured manner, the...
Saved in:
Published in: | IEEE transactions on computer-aided design of integrated circuits and systems 2022-11, Vol.41 (11), p.1-1 |
---|---|
Main Authors: | , , , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Pruning deep neural networks (DNNs) can reduce the model size and thus save hardware resources of a resistive-random-access-memory (ReRAM) based DNN accelerator. For the tightly-coupled crossbar structure, existing ReRAM-based pruning techniques prune the weights of a DNN in a structured manner, thereby attaining low pruning ratios. This paper presents a novel pruning technique, for pruning the weights of a DNN flexibly on crossbar architectures in order to maximize the pruning ratio achieved while preserving crossbar efficiency. We observe that different filters of a weight matrix share a large number of matrix sub-columns (in the same rows), called segments, that can be pruned by using the same segment shape in the sense that the weights at the same column position of these segments are either simultaneously accuracy-sensitive (and should thus be reserved) or simultaneously accuracy-insensitive (and can thus be pruned). Due to the bit-line exchangeability in the crossbar, segments with the same pruning shape can be assembled together into the same crossbar to ensure crossbar execution efficiency. We propose a projection-based shape voting algorithm to select suitable segment shapes to drive the weight pruning process. Accordingly, we also introduce a low-overhead data path that can be easily integrated into any existing ReRAM-based DNN accelerator, achieving a high pruning ratio and a high execution efficiency. Our evaluation shows that outperforms the state-of-the-art, Hybrid-P and FORMAS, by up to 14.6× and 3.6× in pruning ratio, 13.9× and 3.4× in inference speedup, and 12.5× and 3.1× in energy reduction, respectively, while achieving an even higher accuracy at the cost of less than 0.27% extra hardware area overhead. |
---|---|
ISSN: | 0278-0070 1937-4151 |
DOI: | 10.1109/TCAD.2022.3197510 |