Loading…
Low Resource Language Adaptation using Two-stage Regularization for Multilingual ASR
A significant portion of the global population speaks multiple languages, including many low-resource languages on which current multilingual ASR models perform poorly. To improve the low-resource language performance, the models are adapted to low-resource data and only a small number of (extra) pa...
Saved in:
Main Authors: | , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | A significant portion of the global population speaks multiple languages, including many low-resource languages on which current multilingual ASR models perform poorly. To improve the low-resource language performance, the models are adapted to low-resource data and only a small number of (extra) parameters are fine-tuned to prevent overfitting. Multiple models are fine-tuned to support different languages. Then during inference, one of the models is selected for transcription depending on the language to transcribe. However, for applications like smart home devices, the language used each time by multilingual speakers may be different, so the model cannot be selected if the language to transcribe is not known beforehand. To address this, this paper proposes a two-stage regularization-based continual learning method to adapt only one model to transcribe both the source and target languages and to prevent overfitting. Specifically in stage one training, a strong regularization is used to update all the parameters and to prevent overfitting, while in stage two, the regularization is relaxed to improve the model's learning capacity. By adapting Whisper to 10 hours of data for each of 2 languages from Common Voice, results show that our method can reduce average word error rate from 18.59% to 15.52%. |
---|---|
ISSN: | 2159-1970 |
DOI: | 10.1109/IALP63756.2024.10661108 |