Loading…
Issues in Sub-Utterance Level Language Identification in a Code Switched Bilingual Scenario
Sub-utterance level language identification (SLID) is an automatic process of recognizing the spoken language in a code switched (CS) utterance at the sub-utterance level. The nature of CS utterances suggest the primary language has a significant duration of occurrence over the secondary. In a CS ut...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Sub-utterance level language identification (SLID) is an automatic process of recognizing the spoken language in a code switched (CS) utterance at the sub-utterance level. The nature of CS utterances suggest the primary language has a significant duration of occurrence over the secondary. In a CS utterance, a single speaker speaks both the languages. Hence the phoneme-level acoustic characteristic (sub-segmental and segmental evidence) of the secondary language is mostly biased towards the primary. This hypothesizes that the acoustic-based language identification system using CS training data may end with a biased performance towards the primary language. This study proves the hypothesis by observing the performance in terms of the confusion matrix of the earlier proposed approaches. At the same time, language discrimination also can be done at the suprasegmental-level, by capturing language-specific phonemic temporal evidence. Hence, to resolve the biasing issue, this study proposes a wav2vec2-based approach, which captures suprasegmental phonemic temporal patterns in the pre-training stage and merges it to capture language-specific suprasegmental evidence in the finetuning stage. The experimental results show the proposed approach is able to resolve the issue to some extent. As the fine-tuning stage uses a discriminative approach, the weighted loss and secondary language augmentation methods can be explored in the future for further performance improvement. |
---|---|
ISSN: | 2474-915X |
DOI: | 10.1109/SPCOM55316.2022.9840813 |