Loading…

Low Complex & High Accuracy Computation Approximations to Enable On-Device RNN Applications

Recurrent Neural Networks (RNN) have demonstrated excellent results for various Automatic Speech Recognition (ASR) and Natural Language Processing (NLP) tasks. However, executing RNNs requires huge memory and computations which makes it difficult to achieve real time performance on low power devices...

Full description

Saved in:
Bibliographic Details
Main Authors: Pasupuleti, Sirish Kumar, Gadde, Raj Narayana, Rajagopal, Vasanthakumar, Vishnoi, Ashok, Sekhar, N Chandra, Kumar, R Chandra, Miniskar, Narasinga Rao
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Recurrent Neural Networks (RNN) have demonstrated excellent results for various Automatic Speech Recognition (ASR) and Natural Language Processing (NLP) tasks. However, executing RNNs requires huge memory and computations which makes it difficult to achieve real time performance on low power devices like smartphones. Hence, currently ASR and NLP applications such as voice assistants are using cloud based solutions. In this paper, to enable on-device inference, we propose efficient approximations for weights of FC layers and activation functions to reduce the computational complexity. The proposed approximations eliminate multiplications, divisions and exponential operations by replacing them with simple arithmetic operations (shifts, additions) to significantly reduce the computation requirements without any perceivable loss of functional accuracy. The approximations also reduce the memory size and bandwidth requirements. We also present a lightweight VLIW based DSP architecture with these approximations to enable on-device inference. The approximations have been tested on the proposed DSP with various RNN applications like EESEN, LRCN and S2VT. The results with approximations show - accuracies similar to that of float (32-bit) reference, ∼ 8x-12× performance gains, ∼ 2x-4x gains in memory requirement and bandwidth. Moreover, the activation approximation results show better average and peak errors compared to the State of the Art.
ISSN:2158-1525
DOI:10.1109/ISCAS.2019.8702528