Loading…
Assessment of inference accuracy and memory capacity of computation-in-memory enabled neural network due to quantized weights, gradients, input and output signals, and memory non-idealities
This paper proposes an approach to enhance the efficiency of computation-in-memory (CiM) enabled neural networks. The proposed methods involve partial quantization of learning and inference processes within the neural network to increase the training and inference speed while reducing energy and mem...
Saved in:
Published in: | Japanese Journal of Applied Physics 2024-04, Vol.63 (4), p.4 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | This paper proposes an approach to enhance the efficiency of computation-in-memory (CiM) enabled neural networks. The proposed methods involve partial quantization of learning and inference processes within the neural network to increase the training and inference speed while reducing energy and memory consumption. The impact of quantization due to the usage of CiM is evaluated based on inference accuracy. The effect of non-idealities incurred due to the employment of different memories such as resistive random-access memory on the network accuracy is documented and reported. The results indicate that a certain quantization bit precision threshold is necessary for weights, input/output data, and gradients to maintain an acceptable inference accuracy level. Notably, the experiments demonstrate a modest degradation of approximately 2.8% in inference accuracy compared to the neural network trained without using computation-in-memory, this accuracy trade-off is accompanied by a substantial memory footprint improvement, with memory usage reductions of 62% and 93% during the training and inference phase respectively. |
---|---|
ISSN: | 0021-4922 1347-4065 |
DOI: | 10.35848/1347-4065/ad2e45 |