Loading…
Automated grapheme-to-phoneme conversion for Central Kurdish based on optimality theory
•Reviewed the phonology and alphabet of the Central Kurdish, a low-resourced language.•Proposed an adequate rule-based method for G2P conversion of the Central Kurdish.•Specified and ranked phonological constraints in the framework of Optimality Theory.•Showed that the current alphabet of Central Ku...
Saved in:
Published in: | Computer speech & language 2021-11, Vol.70, p.101222, Article 101222 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | •Reviewed the phonology and alphabet of the Central Kurdish, a low-resourced language.•Proposed an adequate rule-based method for G2P conversion of the Central Kurdish.•Specified and ranked phonological constraints in the framework of Optimality Theory.•Showed that the current alphabet of Central Kurdish does not need additional letters.
The writing system of Central Kurdish features three cases in which there is no one-to-one mapping between the orthographical letters and the phonemes of the language. Consequently, the written words including these cases may be pronounced in multiple ways. The process of finding the correct pronunciation of written words is called Grapheme-to-Phoneme (G2P) conversion and is a key step in natural language processing tasks such as speech synthesis. As Central Kurdish is a low-resourced language, we present a G2P conversion method based on the phonological rules of the language, rather than pronunciation dictionaries and data-driven learning methods. After reviewing the phonology and alphabet of the language through the framework of Optimality Theory, we generate all possible pronunciations. Then, by specifying and applying ranked constraints, we eliminate undesirable candidates so as to keep only one well-formed pronunciation per word. The evaluation of our proposed method on two datasets resulted in 0.75% of overall Phoneme Error Rate (PER) and achieved 94.71% precision in the detection of the short vowel /i/ and 100% of accuracy in the conversion of the letters “ی” and “و”. Analyzing these results suggests that there is no need for additional new letters in the current orthographic system of Central Kurdish. This approach also enables us to have a ranked suggestion list for the manual checking of the few unresolved ambiguous situations. |
---|---|
ISSN: | 0885-2308 1095-8363 |
DOI: | 10.1016/j.csl.2021.101222 |