Loading…

Lightweight Compression Of Neural Network Feature Tensors For Collaborative Intelligence

In collaborative intelligence applications, part of a deep neural network (DNN) is deployed on a relatively low-complexity device such as a mobile phone or edge device, and the remainder of the DNN is processed where more computing resources are available, such as in the cloud. This paper presents a...

Full description

Saved in:

Bibliographic Details
Main Authors:	Cohen, Robert A., Choi, Hyomin, Bajic, Ivan V.
Format:	Conference Proceeding
Language:	English
Subjects:	Collaborative intelligence Complexity theory Deep learning Encoding Entropy neural network compression Neural networks quantization Quantization (signal) Tensile stress
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In collaborative intelligence applications, part of a deep neural network (DNN) is deployed on a relatively low-complexity device such as a mobile phone or edge device, and the remainder of the DNN is processed where more computing resources are available, such as in the cloud. This paper presents a novel lightweight compression technique designed specifically to code the activations of a split DNN layer, while having a low complexity suitable for edge devices and not requiring any retraining. We also present a modified entropy-constrained quantizer design algorithm optimized for clipped activations. When applied to popular object-detection and classification DNNs, we were able to compress the 32-bit floating point activations down to 0.6 to 0.8 bits, while keeping the loss in accuracy to less than 1%. When compared to HEVC, we found that the lightweight codec consistently provided better inference accuracy, by up to 1.3%. The performance and simplicity of this lightweight compression technique makes it an attractive option for coding a layer's activations in split neural networks for edge/cloud applications.
ISSN:	1945-788X
DOI:	10.1109/ICME46284.2020.9102797