Loading…

Conformer-Based Speech Recognition On Extreme Edge-Computing Devices

With increasingly more powerful compute capabilities and resources in today's devices, traditionally compute-intensive automatic speech recognition (ASR) has been moving from the cloud to devices to better protect user privacy. However, it is still challenging to implement on-device ASR on reso...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2024-05
Main Authors:	Xu, Mingbin, Jin, Alex, Wang, Sicheng, Su, Mu, Ng, Tim, Mason, Henry, Han, Shiyi, Lei, Zhihong, Deng, Yaqiao, Huang, Zhen, Krishnamoorthy, Mahesh
Format:	Article
Language:	English
Subjects:	Accuracy Automatic speech recognition Devices Edge computing Energy consumption Floating point arithmetic Neural networks Smartphones
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	With increasingly more powerful compute capabilities and resources in today's devices, traditionally compute-intensive automatic speech recognition (ASR) has been moving from the cloud to devices to better protect user privacy. However, it is still challenging to implement on-device ASR on resource-constrained devices, such as smartphones, smart wearables, and other smart home automation devices. In this paper, we propose a series of model architecture adaptions, neural network graph transformations, and numerical optimizations to fit an advanced Conformer based end-to-end streaming ASR system on resource-constrained devices without accuracy degradation. We achieve over 5.26 times faster than realtime (0.19 RTF) speech recognition on smart wearables while minimizing energy consumption and achieving state-of-the-art accuracy. The proposed methods are widely applicable to other transformer-based server-free AI applications. In addition, we provide a complete theory on optimal pre-normalizers that numerically stabilize layer normalization in any Lp-norm using any floating point precision.
ISSN:	2331-8422