Loading…
Automatic pronunciation prediction for text-to-speech synthesis of dialectal arabic in a speech-to-speech translation system
Text-to-speech synthesis (TTS) is the final stage in the speech-tospeech (S2S) translation pipeline, producing an audible rendition of translated text in the target language. TTS systems typically rely on a lexicon to look up pronunciations for each word in the input text. This is problematic when t...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Text-to-speech synthesis (TTS) is the final stage in the speech-tospeech (S2S) translation pipeline, producing an audible rendition of translated text in the target language. TTS systems typically rely on a lexicon to look up pronunciations for each word in the input text. This is problematic when the target language is dialectal Arabic, because the statistical machine translation (SMT) system usually produces undiacritized text output. Many words in the latter possess multiple pronunciations; the correct choice must be inferred from context. In this paper, we present a weakly supervised pronunciation prediction approach for undiacritized dialectal Arabic in S2S systems that leverages automatic speech recognition (ASR) to obtain parallel training data for pronunciation prediction. Additionally, we show that incorporating source language features derived from SMT-generated automatic word alignment further improves automatic pronunciation prediction accuracy. |
---|---|
ISSN: | 1520-6149 2379-190X |
DOI: | 10.1109/ICASSP.2012.6289032 |