Loading…

Role of Prosodic Features on Children's Speech Recognition

In this paper, we have explored the role of combining prosodic variables with the existing acoustic features in the context of children's speech recognition using acoustic models trained on adults' speech. The explored acoustic features are Mel-frequency cepstral coefficients (MFCC) and pe...

Full description

Saved in:
Bibliographic Details
Main Authors: Kathania, Hemant K., Shahnawazuddin, S., Adiga, Nagaraj, Ahmad, Waquar
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In this paper, we have explored the role of combining prosodic variables with the existing acoustic features in the context of children's speech recognition using acoustic models trained on adults' speech. The explored acoustic features are Mel-frequency cepstral coefficients (MFCC) and perceptual linear prediction cepstral coefficients (PLPCC) while the considered prosodic variables are loudness, voice-intensity and voice-probability. An analysis presented in this paper shows that, given that the textual content remains the same, the considered prosodic variables exhibit very similar contours for adults' and children's speech. At the same time, the contours differ a lot when the context is different. Consequently, inclusion of prosodic information reduces the inter-speaker differences and increases the class discrimination. This subsequently improves the recognition performance. Further improvements are obtained by projecting the feature vectors obtained by combining the two features to a lower-dimensional subspace. The same has been experimentally verified in this study for mismatched speech recognition using deep neural network (DNN) based system. On combining MFCC (PLPCC) and prosodic features, a relative improvement of 16% (14%) is noted on decoding children's speech using adult data trained DNN models.
ISSN:2379-190X
DOI:10.1109/ICASSP.2018.8461668