Loading…
A Comparison of Parameter-Efficient ASR Domain Adaptation Methods for Universal Speech and Language Models
A recent paradigm shift in artificial intelligence has seen the rise of foundation models, such as the large language models and the universal speech models. With billions of model parameters and trained with a wide range of data, these foundation models are expected to have a better generalization...
Saved in:
Main Authors: | , , , , , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | A recent paradigm shift in artificial intelligence has seen the rise of foundation models, such as the large language models and the universal speech models. With billions of model parameters and trained with a wide range of data, these foundation models are expected to have a better generalization to different downstream tasks. Efficient adaptation is the key to leveraging these foundation models in a new task or domain. In this paper, we compare several popular parameter-efficient tuning methods, such as vector adaptation, residual adapters, low-rank adapter (LoRA) and prompt-tuning, for automatic speech recognition (ASR) domain adaptation. We use the connectionist temporal classification (CTC) model with Conformer encoder and fused it with a universal language model. We study the effect of adapting either or both of the Conformer encoder and the universal language model. We carry out extensive experiments to study these methods under different hyper-parameter settings and the effect of combining some of these methods. We find that combining vector adaptation and residual adapters with increasing bottleneck dimension achieved the best performance. |
---|---|
ISSN: | 2379-190X |
DOI: | 10.1109/ICASSP48485.2024.10445894 |