Loading…

Role of Prosodic Features on Children's Speech Recognition

In this paper, we have explored the role of combining prosodic variables with the existing acoustic features in the context of children's speech recognition using acoustic models trained on adults' speech. The explored acoustic features are Mel-frequency cepstral coefficients (MFCC) and pe...

Full description

Saved in:
Bibliographic Details
Main Authors: Kathania, Hemant K., Shahnawazuddin, S., Adiga, Nagaraj, Ahmad, Waquar
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 5523
container_issue
container_start_page 5519
container_title
container_volume
creator Kathania, Hemant K.
Shahnawazuddin, S.
Adiga, Nagaraj
Ahmad, Waquar
description In this paper, we have explored the role of combining prosodic variables with the existing acoustic features in the context of children's speech recognition using acoustic models trained on adults' speech. The explored acoustic features are Mel-frequency cepstral coefficients (MFCC) and perceptual linear prediction cepstral coefficients (PLPCC) while the considered prosodic variables are loudness, voice-intensity and voice-probability. An analysis presented in this paper shows that, given that the textual content remains the same, the considered prosodic variables exhibit very similar contours for adults' and children's speech. At the same time, the contours differ a lot when the context is different. Consequently, inclusion of prosodic information reduces the inter-speaker differences and increases the class discrimination. This subsequently improves the recognition performance. Further improvements are obtained by projecting the feature vectors obtained by combining the two features to a lower-dimensional subspace. The same has been experimentally verified in this study for mismatched speech recognition using deep neural network (DNN) based system. On combining MFCC (PLPCC) and prosodic features, a relative improvement of 16% (14%) is noted on decoding children's speech using adult data trained DNN models.
doi_str_mv 10.1109/ICASSP.2018.8461668
format conference_proceeding
fullrecord <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_8461668</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8461668</ieee_id><sourcerecordid>8461668</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-552a361abcbcca33bbdd1383539ddcc01bf7542492822a02c568510bd87b058c3</originalsourceid><addsrcrecordid>eNotj09LwzAcQKMgOGY_wS65eWrNL2nSX7xJcU4YOFYFbyP_6iK1GU09-O0V3OndHu8RsgJWATB999w-dN2u4gywwlqBUnhBCt0gSIGqVhLxkiy4aHQJmr1fkyLnT8YYV1g3Ui3I_T4Ngaae7qaUk4-OroOZv6eQaRppe4yDn8J4m2l3CsEd6T649DHGOabxhlz1ZsihOHNJ3taPr-2m3L48_XVtywiNnEspuREKjHXWOSOEtd6DQCGF9t45BrZvZM1rzZFzw7iTCiUw67GxTKITS7L698YQwuE0xS8z_RzOt-IX_SRIFg</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Role of Prosodic Features on Children's Speech Recognition</title><source>IEEE Xplore All Conference Series</source><creator>Kathania, Hemant K. ; Shahnawazuddin, S. ; Adiga, Nagaraj ; Ahmad, Waquar</creator><creatorcontrib>Kathania, Hemant K. ; Shahnawazuddin, S. ; Adiga, Nagaraj ; Ahmad, Waquar</creatorcontrib><description>In this paper, we have explored the role of combining prosodic variables with the existing acoustic features in the context of children's speech recognition using acoustic models trained on adults' speech. The explored acoustic features are Mel-frequency cepstral coefficients (MFCC) and perceptual linear prediction cepstral coefficients (PLPCC) while the considered prosodic variables are loudness, voice-intensity and voice-probability. An analysis presented in this paper shows that, given that the textual content remains the same, the considered prosodic variables exhibit very similar contours for adults' and children's speech. At the same time, the contours differ a lot when the context is different. Consequently, inclusion of prosodic information reduces the inter-speaker differences and increases the class discrimination. This subsequently improves the recognition performance. Further improvements are obtained by projecting the feature vectors obtained by combining the two features to a lower-dimensional subspace. The same has been experimentally verified in this study for mismatched speech recognition using deep neural network (DNN) based system. On combining MFCC (PLPCC) and prosodic features, a relative improvement of 16% (14%) is noted on decoding children's speech using adult data trained DNN models.</description><identifier>EISSN: 2379-190X</identifier><identifier>EISBN: 9781538646588</identifier><identifier>EISBN: 1538646587</identifier><identifier>DOI: 10.1109/ICASSP.2018.8461668</identifier><language>eng</language><publisher>IEEE</publisher><subject>acoustic mismatch ; Children's ASR ; Feature extraction ; feature projection ; Hidden Markov models ; Mel frequency cepstral coefficient ; prosodic variables ; Speech recognition ; Task analysis</subject><ispartof>2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, p.5519-5523</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8461668$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8461668$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Kathania, Hemant K.</creatorcontrib><creatorcontrib>Shahnawazuddin, S.</creatorcontrib><creatorcontrib>Adiga, Nagaraj</creatorcontrib><creatorcontrib>Ahmad, Waquar</creatorcontrib><title>Role of Prosodic Features on Children's Speech Recognition</title><title>2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</title><addtitle>ICASSP</addtitle><description>In this paper, we have explored the role of combining prosodic variables with the existing acoustic features in the context of children's speech recognition using acoustic models trained on adults' speech. The explored acoustic features are Mel-frequency cepstral coefficients (MFCC) and perceptual linear prediction cepstral coefficients (PLPCC) while the considered prosodic variables are loudness, voice-intensity and voice-probability. An analysis presented in this paper shows that, given that the textual content remains the same, the considered prosodic variables exhibit very similar contours for adults' and children's speech. At the same time, the contours differ a lot when the context is different. Consequently, inclusion of prosodic information reduces the inter-speaker differences and increases the class discrimination. This subsequently improves the recognition performance. Further improvements are obtained by projecting the feature vectors obtained by combining the two features to a lower-dimensional subspace. The same has been experimentally verified in this study for mismatched speech recognition using deep neural network (DNN) based system. On combining MFCC (PLPCC) and prosodic features, a relative improvement of 16% (14%) is noted on decoding children's speech using adult data trained DNN models.</description><subject>acoustic mismatch</subject><subject>Children's ASR</subject><subject>Feature extraction</subject><subject>feature projection</subject><subject>Hidden Markov models</subject><subject>Mel frequency cepstral coefficient</subject><subject>prosodic variables</subject><subject>Speech recognition</subject><subject>Task analysis</subject><issn>2379-190X</issn><isbn>9781538646588</isbn><isbn>1538646587</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2018</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotj09LwzAcQKMgOGY_wS65eWrNL2nSX7xJcU4YOFYFbyP_6iK1GU09-O0V3OndHu8RsgJWATB999w-dN2u4gywwlqBUnhBCt0gSIGqVhLxkiy4aHQJmr1fkyLnT8YYV1g3Ui3I_T4Ngaae7qaUk4-OroOZv6eQaRppe4yDn8J4m2l3CsEd6T649DHGOabxhlz1ZsihOHNJ3taPr-2m3L48_XVtywiNnEspuREKjHXWOSOEtd6DQCGF9t45BrZvZM1rzZFzw7iTCiUw67GxTKITS7L698YQwuE0xS8z_RzOt-IX_SRIFg</recordid><startdate>201804</startdate><enddate>201804</enddate><creator>Kathania, Hemant K.</creator><creator>Shahnawazuddin, S.</creator><creator>Adiga, Nagaraj</creator><creator>Ahmad, Waquar</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>201804</creationdate><title>Role of Prosodic Features on Children's Speech Recognition</title><author>Kathania, Hemant K. ; Shahnawazuddin, S. ; Adiga, Nagaraj ; Ahmad, Waquar</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-552a361abcbcca33bbdd1383539ddcc01bf7542492822a02c568510bd87b058c3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2018</creationdate><topic>acoustic mismatch</topic><topic>Children's ASR</topic><topic>Feature extraction</topic><topic>feature projection</topic><topic>Hidden Markov models</topic><topic>Mel frequency cepstral coefficient</topic><topic>prosodic variables</topic><topic>Speech recognition</topic><topic>Task analysis</topic><toplevel>online_resources</toplevel><creatorcontrib>Kathania, Hemant K.</creatorcontrib><creatorcontrib>Shahnawazuddin, S.</creatorcontrib><creatorcontrib>Adiga, Nagaraj</creatorcontrib><creatorcontrib>Ahmad, Waquar</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE/IET Electronic Library</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Kathania, Hemant K.</au><au>Shahnawazuddin, S.</au><au>Adiga, Nagaraj</au><au>Ahmad, Waquar</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Role of Prosodic Features on Children's Speech Recognition</atitle><btitle>2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</btitle><stitle>ICASSP</stitle><date>2018-04</date><risdate>2018</risdate><spage>5519</spage><epage>5523</epage><pages>5519-5523</pages><eissn>2379-190X</eissn><eisbn>9781538646588</eisbn><eisbn>1538646587</eisbn><abstract>In this paper, we have explored the role of combining prosodic variables with the existing acoustic features in the context of children's speech recognition using acoustic models trained on adults' speech. The explored acoustic features are Mel-frequency cepstral coefficients (MFCC) and perceptual linear prediction cepstral coefficients (PLPCC) while the considered prosodic variables are loudness, voice-intensity and voice-probability. An analysis presented in this paper shows that, given that the textual content remains the same, the considered prosodic variables exhibit very similar contours for adults' and children's speech. At the same time, the contours differ a lot when the context is different. Consequently, inclusion of prosodic information reduces the inter-speaker differences and increases the class discrimination. This subsequently improves the recognition performance. Further improvements are obtained by projecting the feature vectors obtained by combining the two features to a lower-dimensional subspace. The same has been experimentally verified in this study for mismatched speech recognition using deep neural network (DNN) based system. On combining MFCC (PLPCC) and prosodic features, a relative improvement of 16% (14%) is noted on decoding children's speech using adult data trained DNN models.</abstract><pub>IEEE</pub><doi>10.1109/ICASSP.2018.8461668</doi><tpages>5</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier EISSN: 2379-190X
ispartof 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, p.5519-5523
issn 2379-190X
language eng
recordid cdi_ieee_primary_8461668
source IEEE Xplore All Conference Series
subjects acoustic mismatch
Children's ASR
Feature extraction
feature projection
Hidden Markov models
Mel frequency cepstral coefficient
prosodic variables
Speech recognition
Task analysis
title Role of Prosodic Features on Children's Speech Recognition
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T12%3A05%3A42IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Role%20of%20Prosodic%20Features%20on%20Children's%20Speech%20Recognition&rft.btitle=2018%20IEEE%20International%20Conference%20on%20Acoustics,%20Speech%20and%20Signal%20Processing%20(ICASSP)&rft.au=Kathania,%20Hemant%20K.&rft.date=2018-04&rft.spage=5519&rft.epage=5523&rft.pages=5519-5523&rft.eissn=2379-190X&rft_id=info:doi/10.1109/ICASSP.2018.8461668&rft.eisbn=9781538646588&rft.eisbn_list=1538646587&rft_dat=%3Cieee_CHZPO%3E8461668%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i175t-552a361abcbcca33bbdd1383539ddcc01bf7542492822a02c568510bd87b058c3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=8461668&rfr_iscdi=true