Loading…
CIMQ: A Hardware-Efficient Quantization Framework for Computing-In-Memory Based Neural Network Accelerators
The novel Computing-In-Memory (CIM) technology has demonstrated significant potential in enhancing the performance and efficiency of convolutional neural networks (CNNs). However, due to the low precision of memory devices and data interfaces, an additional quantization step is necessary. Convention...
Saved in:
Published in: | IEEE transactions on computer-aided design of integrated circuits and systems 2024-01, Vol.43 (1), p.1-1 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The novel Computing-In-Memory (CIM) technology has demonstrated significant potential in enhancing the performance and efficiency of convolutional neural networks (CNNs). However, due to the low precision of memory devices and data interfaces, an additional quantization step is necessary. Conventional NN quantization methods fail to account for the hardware characteristics of CIM, resulting in inferior system performance and efficiency. This paper proposes CIMQ, a hardware-efficient quantization framework designed to improve the efficiency of CIM based NN accelerators. The holistic framework focuses on the fundamental computing elements in CIM hardware: inputs, weights and outputs (or activations, weights and partial sums in NNs) with four innovative techniques. Firstly, bit-level sparsity induced activation quantization is introduced to decrease dynamic computation energy. Secondly, inspired by the unique computation paradigm of CIM, an innovative array-wise quantization granularity is proposed for weight quantization. Thirdly, partial sums are quantized with a reparametrized clipping function to reduce the required resolution of analog-to-digital converters (ADCs). Finally, to improve the accuracy of quantized neural networks (QNNs), the post-training quantization (PTQ) is enhanced with a random quantization dropping strategy. The effectiveness of the proposed framework has been demonstrated through experimental results on various NNs and datasets (CIFAR10, CIFAR100, ImageNet). In typical cases, the hardware efficiency can be improved up to 222% with a 58.97% improvement in accuracy compared to conventional quantization methods. |
---|---|
ISSN: | 0278-0070 1937-4151 |
DOI: | 10.1109/TCAD.2023.3298705 |