Loading…
Standard Yorùbá Context Dependent Tone Identification Using Multi-Class Support Vector Machine (MSVM)
Most state-of-the-art large vocabulary continuous speech recognition systems employ context dependent (CD) phone units, however, the CD phone units are not efficient in capturing long-term spectral dependencies of tone in most tone languages. The Standard Yorùbá (SY) is a language composed of syllab...
Saved in:
Published in: | Journal of Applied Sciences and Environmental Management 2019-11, Vol.23 (5), p.895 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Most state-of-the-art large vocabulary continuous speech recognition
systems employ context dependent (CD) phone units, however, the CD
phone units are not efficient in capturing long-term spectral
dependencies of tone in most tone languages. The Standard
Yorùbá (SY) is a language composed of syllable with tones and
requires different method for the acoustic modeling. In this paper, a
context dependent tone acoustic model was developed. Tone unit is
assumed as syllables, amplitude magnified difference function (AMDF)
was used to derive the utterance wide F0 contour, followed by automatic
syllabification and tri-syllable forced alignment with speech
phonetization alignment and syllabification SPPAS tool. For
classification of the context dependent (CD) tone, slope and intercept
of F0 values were extracted from each segmented unit. Supervised
clustering scheme was utilized to partition CD tri-tone based on
category and normalized based on some statistics to derive the acoustic
feature vectors. Multi-class support vector machine (MSVM) was used for
tri-tone training. From the experimental results, it was observed that
the word recognition accuracy obtained from the MSVM tri-tone system
based on dynamic programming tone embedded features was comparable with
phone features. A best parameter tuning was obtained for 10-fold cross
validation and overall accuracy was 97.5678%. In term of word error
rate (WER), the MSVM CD tri-tone system outperforms the hidden Markov
model tri-phone system with WER of 44.47%. |
---|---|
ISSN: | 1119-8362 2659-1502 2659-1499 |
DOI: | 10.4314/jasem.v23i5.20 |