Loading…

Non-Autoregressive ASR Modeling Using Pre-Trained Language Models for Chinese Speech Recognition

Transformer-based models have led to significant innovation in various classic and practical subjects, including speech processing, natural language processing, and computer vision. On top of the Transformer, attention-based end-to-end automatic speech recognition (ASR) models have become a popular...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2022, Vol.30, p.1474-1482
Main Authors:	Yu, Fu-Hao, Chen, Kuan-Yu, Lu, Ke-Han
Format:	Article
Language:	English
Subjects:	Acoustics Attention Automatic speech recognition Autoregressive models Chinese languages Coders Computational modeling Computer architecture Computer vision Decoding Inference Language Language modeling Machine learning Modelling Natural language processing non-autoregressive pre-trained language model Predictive models Representations Speech processing Speech recognition Training transformer Transformers Voice recognition
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Transformer-based models have led to significant innovation in various classic and practical subjects, including speech processing, natural language processing, and computer vision. On top of the Transformer, attention-based end-to-end automatic speech recognition (ASR) models have become a popular fashion in recent years. Specifically, an emergent research topic is non-autoregressive modeling, which can achieve fast inference speed and obtain competitive performance when compared with conventional autoregressive methods. In addition, in the context of natural language processing, the bidirectional encoder representations from Transformers (BERT) model and its variants have received widespread attention, partially due to their ability to infer contextualized word representations and obtain superior performances of downstream tasks through simple fine-tuning. However, to our knowledge, leveraging the synergistic power of non-autoregressive modeling and pre-trained language model for ASR remains relatively underexplored. In this regard, this study presents a novel pre-trained language model-based non-autoregressive ASR framework. A series of experiments were conducted on two publicly available Chinese datasets, AISHELL-1 and AISHELL-2, to demonstrate competitive or superior results of the proposed ASR models when compared with well-practiced baseline systems. In addition, a set of comparative experiments is likewise carried out with different settings to analyze the performance of the proposed framework.
ISSN:	2329-9290 2329-9304
DOI:	10.1109/TASLP.2022.3166400