Loading…

Automatic Speech Recognition of English-isiZulu Code-switched Speech from South African Soap Operas

We introduce a new English-isiZulu code-switched speech corpus compiled from South African soap opera broadcasts. isiZulu itself is currently under-resourced, and automatic speech recognition is made even more challenging by the high prevalence of code-switching in spontaneous speech. Analysis of th...

Full description

Saved in:
Bibliographic Details
Published in:Procedia computer science 2016, Vol.81, p.121-127
Main Authors: van der Westhuizen, Ewald, Niesler, Thomas
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We introduce a new English-isiZulu code-switched speech corpus compiled from South African soap opera broadcasts. isiZulu itself is currently under-resourced, and automatic speech recognition is made even more challenging by the high prevalence of code-switching in spontaneous speech. Analysis of the corpus reflects effects common in conversational isiZulu, such as vowel deletion and cross-language prefixes and suffixes. Baseline monolingual and code-switched automatic speech recognition systems are developed, including a new language model configuration that explicitly includes switching transitions. For code-switched speech, a system with language-dependent acoustic models and language-dependent language models linked by switching transitions leads to best performance, although word error rates overall remain very high.
ISSN:1877-0509
1877-0509
DOI:10.1016/j.procs.2016.04.039