Loading…
Breaking Barriers: Can Multilingual Foundation Models Bridge the Gap in Cross-Language Speech Emotion Recognition?
Speech emotion recognition (SER) faces challenges in cross-language scenarios due to differences in linguistic and cultural expression of emotions across languages. Recently, large multilingual foundation models pre-trained on massive corpora have achieved performance on natural language understandi...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Speech emotion recognition (SER) faces challenges in cross-language scenarios due to differences in linguistic and cultural expression of emotions across languages. Recently, large multilingual foundation models pre-trained on massive corpora have achieved performance on natural language understanding tasks by learning cross-lingual representations. Their ability to understand relationships between languages without direct translation opens up possibilities for more applicable multilingual models. In this paper, we evaluate the capabilities of foundation models (Wav2Vec2, XLSR, Whisper and MMS) to bridge the gap in cross-language SER. Specifically, we analyse their performance on benchmark cross-language SER datasets involving four languages for emotion classification. Our experiments show that the foundation model outperforms CNN-LSTM baselines, establishing their superiority in cross-lingual transfer learning for emotion recognition. However, self-supervised pre-training plays a key role, and inductive biases alone are insufficient for high cross-lingual generalisability. Foundation models also demonstrate gains over baselines with limited target data and better performance on noisy data. Our findings indicate that while foundation models hold promise, pre-training remains vital for handling linguistic variations across languages for SER. |
---|---|
ISSN: | 2831-7343 |
DOI: | 10.1109/SNAMS60348.2023.10375468 |