Loading…

Compressing RNNs to Kilobyte Budget for IoT Devices Using Kronecker Products

Micro-controllers (MCUs) make up most of the processors in the world with widespread applicability from automobile to medical devices. The Internet of Things promises to enable these resource-constrained MCUs with machine learning algorithms to provide always-on intelligence. Many Internet of Things...

Full description

Saved in:

Bibliographic Details
Published in:	ACM journal on emerging technologies in computing systems 2021-10, Vol.17 (4), p.1-18
Main Authors:	Thakker, Urmish, Fedorov, Igor, Zhou, Chu, Gope, Dibakar, Mattina, Matthew, Dasika, Ganesh, Beu, Jesse
Format:	Article
Language:	English
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Micro-controllers (MCUs) make up most of the processors in the world with widespread applicability from automobile to medical devices. The Internet of Things promises to enable these resource-constrained MCUs with machine learning algorithms to provide always-on intelligence. Many Internet of Things applications consume time-series data that are naturally suitable for recurrent neural networks (RNNs) like LSTMs and GRUs. However, RNNs can be large and difficult to deploy on these devices, as they have few kilobytes of memory. As a result, there is a need for compression techniques that can significantly compress RNNs without negatively impacting task accuracy. This article introduces a method to compress RNNs for resource-constrained environments using the Kronecker product (KP). KPs can compress RNN layers by 16× to 38× with minimal accuracy loss. By quantizing the resulting models to 8 bits, we further push the compression factor to 50×. We compare KP with other state-of-the-art compression techniques across seven benchmarks spanning five different applications and show that KP can beat the task accuracy achieved by other techniques by a large margin while simultaneously improving the inference runtime. Sometimes the KP compression mechanism can introduce an accuracy loss. We develop a hybrid KP approach to mitigate this. Our hybrid KP algorithm provides fine-grained control over the compression ratio, enabling us to regain accuracy lost during compression by adding a small number of model parameters.
ISSN:	1550-4832 1550-4840
DOI:	10.1145/3440016