Loading…

ENCORE Compression: Exploiting Narrow-width Values for Quantized Deep Neural Networks

Deep Neural Networks (DNNs) become a practical machine learning algorithm running on various Neural Processing Units (NPUs). For higher performance and lower hardware overheads, DNN datatype reduction through quantization is proposed. Moreover, to solve the memory bottleneck caused by large data siz...

Full description

Saved in:
Bibliographic Details
Main Authors: Jang, Myeongjae, Kim, Jinkwon, Kim, Jesung, Kim, Soontae
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Deep Neural Networks (DNNs) become a practical machine learning algorithm running on various Neural Processing Units (NPUs). For higher performance and lower hardware overheads, DNN datatype reduction through quantization is proposed. Moreover, to solve the memory bottleneck caused by large data size in DNNs, several zero value-aware compression algorithms are used. However, these compression algorithms do not compress modern quantized DNNs well because of decreased zero values. We find that the latest quantized DNNs have data redundancy due to frequent narrow-width values. Because low-precision quantization reduces DNN datatypes to a simple datatype with less bits, scattered DNN data are gathered to a small number of discrete values and incur a biased data distribution. Narrow-width values occupy a large proportion of the biased distribution. Moreover, an appropriate zero run-length bits can be dynamically changed according to DNN sparsity. Based on this observation, we propose a compression algorithm that exploits narrow-width values and variable zero run-length for quantized DNNs. In experiments with three quantized DNNs, our proposed scheme yields an average compression ratio of 2.99.
ISSN:1558-1101
DOI:10.23919/DATE54114.2022.9774545