Loading…

CIMQ: A Hardware-Efficient Quantization Framework for Computing-In-Memory Based Neural Network Accelerators

The novel Computing-In-Memory (CIM) technology has demonstrated significant potential in enhancing the performance and efficiency of convolutional neural networks (CNNs). However, due to the low precision of memory devices and data interfaces, an additional quantization step is necessary. Convention...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on computer-aided design of integrated circuits and systems 2024-01, Vol.43 (1), p.1-1
Main Authors:	Bai, Jinyu, Sun, Sifan, Zhao, Weisheng, Kang, Wang
Format:	Article
Language:	English
Subjects:	Accelerators Accuracy Analog to digital converters Artificial neural networks bit-level sparsity Common Information Model (computing) Computation Computational efficiency Computing-in-memory Efficiency Hardware Memory devices Memory management neural network quantization Neural networks post-training quantization Quantization (signal) quantization granularity reparametrization Training
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The novel Computing-In-Memory (CIM) technology has demonstrated significant potential in enhancing the performance and efficiency of convolutional neural networks (CNNs). However, due to the low precision of memory devices and data interfaces, an additional quantization step is necessary. Conventional NN quantization methods fail to account for the hardware characteristics of CIM, resulting in inferior system performance and efficiency. This paper proposes CIMQ, a hardware-efficient quantization framework designed to improve the efficiency of CIM based NN accelerators. The holistic framework focuses on the fundamental computing elements in CIM hardware: inputs, weights and outputs (or activations, weights and partial sums in NNs) with four innovative techniques. Firstly, bit-level sparsity induced activation quantization is introduced to decrease dynamic computation energy. Secondly, inspired by the unique computation paradigm of CIM, an innovative array-wise quantization granularity is proposed for weight quantization. Thirdly, partial sums are quantized with a reparametrized clipping function to reduce the required resolution of analog-to-digital converters (ADCs). Finally, to improve the accuracy of quantized neural networks (QNNs), the post-training quantization (PTQ) is enhanced with a random quantization dropping strategy. The effectiveness of the proposed framework has been demonstrated through experimental results on various NNs and datasets (CIFAR10, CIFAR100, ImageNet). In typical cases, the hardware efficiency can be improved up to 222% with a 58.97% improvement in accuracy compared to conventional quantization methods.
ISSN:	0278-0070 1937-4151
DOI:	10.1109/TCAD.2023.3298705