Loading…

Issues in Sub-Utterance Level Language Identification in a Code Switched Bilingual Scenario

Sub-utterance level language identification (SLID) is an automatic process of recognizing the spoken language in a code switched (CS) utterance at the sub-utterance level. The nature of CS utterances suggest the primary language has a significant duration of occurrence over the secondary. In a CS ut...

Full description

Saved in:

Bibliographic Details
Main Authors:	Mishra, Jagabandhu, Gandra, Joshitha, Patil, Vaishnavi, Mahadeva Prasanna, S. R.
Format:	Conference Proceeding
Language:	English
Subjects:	Acoustics Code switched (CS) bilingual speech Codes Conferences Deepspeech2 Signal processing Speech coding Sub-utterance level language identification (SLID) Switches Training data wav2vec2
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Sub-utterance level language identification (SLID) is an automatic process of recognizing the spoken language in a code switched (CS) utterance at the sub-utterance level. The nature of CS utterances suggest the primary language has a significant duration of occurrence over the secondary. In a CS utterance, a single speaker speaks both the languages. Hence the phoneme-level acoustic characteristic (sub-segmental and segmental evidence) of the secondary language is mostly biased towards the primary. This hypothesizes that the acoustic-based language identification system using CS training data may end with a biased performance towards the primary language. This study proves the hypothesis by observing the performance in terms of the confusion matrix of the earlier proposed approaches. At the same time, language discrimination also can be done at the suprasegmental-level, by capturing language-specific phonemic temporal evidence. Hence, to resolve the biasing issue, this study proposes a wav2vec2-based approach, which captures suprasegmental phonemic temporal patterns in the pre-training stage and merges it to capture language-specific suprasegmental evidence in the finetuning stage. The experimental results show the proposed approach is able to resolve the issue to some extent. As the fine-tuning stage uses a discriminative approach, the weighted loss and secondary language augmentation methods can be explored in the future for further performance improvement.
ISSN:	2474-915X
DOI:	10.1109/SPCOM55316.2022.9840813