Loading…
Non-Autoregressive ASR Modeling Using Pre-Trained Language Models for Chinese Speech Recognition
Transformer-based models have led to significant innovation in various classic and practical subjects, including speech processing, natural language processing, and computer vision. On top of the Transformer, attention-based end-to-end automatic speech recognition (ASR) models have become a popular...
Saved in:
Published in: | IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2022, Vol.30, p.1474-1482 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Transformer-based models have led to significant innovation in various classic and practical subjects, including speech processing, natural language processing, and computer vision. On top of the Transformer, attention-based end-to-end automatic speech recognition (ASR) models have become a popular fashion in recent years. Specifically, an emergent research topic is non-autoregressive modeling, which can achieve fast inference speed and obtain competitive performance when compared with conventional autoregressive methods. In addition, in the context of natural language processing, the bidirectional encoder representations from Transformers (BERT) model and its variants have received widespread attention, partially due to their ability to infer contextualized word representations and obtain superior performances of downstream tasks through simple fine-tuning. However, to our knowledge, leveraging the synergistic power of non-autoregressive modeling and pre-trained language model for ASR remains relatively underexplored. In this regard, this study presents a novel pre-trained language model-based non-autoregressive ASR framework. A series of experiments were conducted on two publicly available Chinese datasets, AISHELL-1 and AISHELL-2, to demonstrate competitive or superior results of the proposed ASR models when compared with well-practiced baseline systems. In addition, a set of comparative experiments is likewise carried out with different settings to analyze the performance of the proposed framework. |
---|---|
ISSN: | 2329-9290 2329-9304 |
DOI: | 10.1109/TASLP.2022.3166400 |