Loading…
A Fixed-Point Quantization Technique for Convolutional Neural Networks Based on Weight Scaling
In order to make convolutional neural networks (CNNs) usable on smaller or mobile devices, it is necessary to reduce the computing, energy and storage requirements of these networks. One can achieved this by a fixed-point quantization of weights and activations of a CNN, which are usually represente...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In order to make convolutional neural networks (CNNs) usable on smaller or mobile devices, it is necessary to reduce the computing, energy and storage requirements of these networks. One can achieved this by a fixed-point quantization of weights and activations of a CNN, which are usually represented by 32-bit floating-point. In this paper, we present an adaption of convolutional and fully connected layers in order to obtain a high usage of the given value range of activations and weights. Therefore, we introduce scaling factors obtained by moving average to limit the weights and activations. Our model, quantized to 8 bit, outperforms the 7-layer baseline model from which it is derived and the naive quantization by several percentage points. Our method does not require any additional operations in the inference and both the weights and activations have a fixed radix point. |
---|---|
ISSN: | 2381-8549 |
DOI: | 10.1109/ICIP.2019.8803490 |