Loading…
Role of Prosodic Features on Children's Speech Recognition
In this paper, we have explored the role of combining prosodic variables with the existing acoustic features in the context of children's speech recognition using acoustic models trained on adults' speech. The explored acoustic features are Mel-frequency cepstral coefficients (MFCC) and pe...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | 5523 |
container_issue | |
container_start_page | 5519 |
container_title | |
container_volume | |
creator | Kathania, Hemant K. Shahnawazuddin, S. Adiga, Nagaraj Ahmad, Waquar |
description | In this paper, we have explored the role of combining prosodic variables with the existing acoustic features in the context of children's speech recognition using acoustic models trained on adults' speech. The explored acoustic features are Mel-frequency cepstral coefficients (MFCC) and perceptual linear prediction cepstral coefficients (PLPCC) while the considered prosodic variables are loudness, voice-intensity and voice-probability. An analysis presented in this paper shows that, given that the textual content remains the same, the considered prosodic variables exhibit very similar contours for adults' and children's speech. At the same time, the contours differ a lot when the context is different. Consequently, inclusion of prosodic information reduces the inter-speaker differences and increases the class discrimination. This subsequently improves the recognition performance. Further improvements are obtained by projecting the feature vectors obtained by combining the two features to a lower-dimensional subspace. The same has been experimentally verified in this study for mismatched speech recognition using deep neural network (DNN) based system. On combining MFCC (PLPCC) and prosodic features, a relative improvement of 16% (14%) is noted on decoding children's speech using adult data trained DNN models. |
doi_str_mv | 10.1109/ICASSP.2018.8461668 |
format | conference_proceeding |
fullrecord | <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_8461668</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8461668</ieee_id><sourcerecordid>8461668</sourcerecordid><originalsourceid>FETCH-LOGICAL-i175t-552a361abcbcca33bbdd1383539ddcc01bf7542492822a02c568510bd87b058c3</originalsourceid><addsrcrecordid>eNotj09LwzAcQKMgOGY_wS65eWrNL2nSX7xJcU4YOFYFbyP_6iK1GU09-O0V3OndHu8RsgJWATB999w-dN2u4gywwlqBUnhBCt0gSIGqVhLxkiy4aHQJmr1fkyLnT8YYV1g3Ui3I_T4Ngaae7qaUk4-OroOZv6eQaRppe4yDn8J4m2l3CsEd6T649DHGOabxhlz1ZsihOHNJ3taPr-2m3L48_XVtywiNnEspuREKjHXWOSOEtd6DQCGF9t45BrZvZM1rzZFzw7iTCiUw67GxTKITS7L698YQwuE0xS8z_RzOt-IX_SRIFg</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Role of Prosodic Features on Children's Speech Recognition</title><source>IEEE Xplore All Conference Series</source><creator>Kathania, Hemant K. ; Shahnawazuddin, S. ; Adiga, Nagaraj ; Ahmad, Waquar</creator><creatorcontrib>Kathania, Hemant K. ; Shahnawazuddin, S. ; Adiga, Nagaraj ; Ahmad, Waquar</creatorcontrib><description>In this paper, we have explored the role of combining prosodic variables with the existing acoustic features in the context of children's speech recognition using acoustic models trained on adults' speech. The explored acoustic features are Mel-frequency cepstral coefficients (MFCC) and perceptual linear prediction cepstral coefficients (PLPCC) while the considered prosodic variables are loudness, voice-intensity and voice-probability. An analysis presented in this paper shows that, given that the textual content remains the same, the considered prosodic variables exhibit very similar contours for adults' and children's speech. At the same time, the contours differ a lot when the context is different. Consequently, inclusion of prosodic information reduces the inter-speaker differences and increases the class discrimination. This subsequently improves the recognition performance. Further improvements are obtained by projecting the feature vectors obtained by combining the two features to a lower-dimensional subspace. The same has been experimentally verified in this study for mismatched speech recognition using deep neural network (DNN) based system. On combining MFCC (PLPCC) and prosodic features, a relative improvement of 16% (14%) is noted on decoding children's speech using adult data trained DNN models.</description><identifier>EISSN: 2379-190X</identifier><identifier>EISBN: 9781538646588</identifier><identifier>EISBN: 1538646587</identifier><identifier>DOI: 10.1109/ICASSP.2018.8461668</identifier><language>eng</language><publisher>IEEE</publisher><subject>acoustic mismatch ; Children's ASR ; Feature extraction ; feature projection ; Hidden Markov models ; Mel frequency cepstral coefficient ; prosodic variables ; Speech recognition ; Task analysis</subject><ispartof>2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, p.5519-5523</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8461668$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,27925,54555,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8461668$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Kathania, Hemant K.</creatorcontrib><creatorcontrib>Shahnawazuddin, S.</creatorcontrib><creatorcontrib>Adiga, Nagaraj</creatorcontrib><creatorcontrib>Ahmad, Waquar</creatorcontrib><title>Role of Prosodic Features on Children's Speech Recognition</title><title>2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</title><addtitle>ICASSP</addtitle><description>In this paper, we have explored the role of combining prosodic variables with the existing acoustic features in the context of children's speech recognition using acoustic models trained on adults' speech. The explored acoustic features are Mel-frequency cepstral coefficients (MFCC) and perceptual linear prediction cepstral coefficients (PLPCC) while the considered prosodic variables are loudness, voice-intensity and voice-probability. An analysis presented in this paper shows that, given that the textual content remains the same, the considered prosodic variables exhibit very similar contours for adults' and children's speech. At the same time, the contours differ a lot when the context is different. Consequently, inclusion of prosodic information reduces the inter-speaker differences and increases the class discrimination. This subsequently improves the recognition performance. Further improvements are obtained by projecting the feature vectors obtained by combining the two features to a lower-dimensional subspace. The same has been experimentally verified in this study for mismatched speech recognition using deep neural network (DNN) based system. On combining MFCC (PLPCC) and prosodic features, a relative improvement of 16% (14%) is noted on decoding children's speech using adult data trained DNN models.</description><subject>acoustic mismatch</subject><subject>Children's ASR</subject><subject>Feature extraction</subject><subject>feature projection</subject><subject>Hidden Markov models</subject><subject>Mel frequency cepstral coefficient</subject><subject>prosodic variables</subject><subject>Speech recognition</subject><subject>Task analysis</subject><issn>2379-190X</issn><isbn>9781538646588</isbn><isbn>1538646587</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2018</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotj09LwzAcQKMgOGY_wS65eWrNL2nSX7xJcU4YOFYFbyP_6iK1GU09-O0V3OndHu8RsgJWATB999w-dN2u4gywwlqBUnhBCt0gSIGqVhLxkiy4aHQJmr1fkyLnT8YYV1g3Ui3I_T4Ngaae7qaUk4-OroOZv6eQaRppe4yDn8J4m2l3CsEd6T649DHGOabxhlz1ZsihOHNJ3taPr-2m3L48_XVtywiNnEspuREKjHXWOSOEtd6DQCGF9t45BrZvZM1rzZFzw7iTCiUw67GxTKITS7L698YQwuE0xS8z_RzOt-IX_SRIFg</recordid><startdate>201804</startdate><enddate>201804</enddate><creator>Kathania, Hemant K.</creator><creator>Shahnawazuddin, S.</creator><creator>Adiga, Nagaraj</creator><creator>Ahmad, Waquar</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>201804</creationdate><title>Role of Prosodic Features on Children's Speech Recognition</title><author>Kathania, Hemant K. ; Shahnawazuddin, S. ; Adiga, Nagaraj ; Ahmad, Waquar</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i175t-552a361abcbcca33bbdd1383539ddcc01bf7542492822a02c568510bd87b058c3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2018</creationdate><topic>acoustic mismatch</topic><topic>Children's ASR</topic><topic>Feature extraction</topic><topic>feature projection</topic><topic>Hidden Markov models</topic><topic>Mel frequency cepstral coefficient</topic><topic>prosodic variables</topic><topic>Speech recognition</topic><topic>Task analysis</topic><toplevel>online_resources</toplevel><creatorcontrib>Kathania, Hemant K.</creatorcontrib><creatorcontrib>Shahnawazuddin, S.</creatorcontrib><creatorcontrib>Adiga, Nagaraj</creatorcontrib><creatorcontrib>Ahmad, Waquar</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE/IET Electronic Library</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Kathania, Hemant K.</au><au>Shahnawazuddin, S.</au><au>Adiga, Nagaraj</au><au>Ahmad, Waquar</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Role of Prosodic Features on Children's Speech Recognition</atitle><btitle>2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</btitle><stitle>ICASSP</stitle><date>2018-04</date><risdate>2018</risdate><spage>5519</spage><epage>5523</epage><pages>5519-5523</pages><eissn>2379-190X</eissn><eisbn>9781538646588</eisbn><eisbn>1538646587</eisbn><abstract>In this paper, we have explored the role of combining prosodic variables with the existing acoustic features in the context of children's speech recognition using acoustic models trained on adults' speech. The explored acoustic features are Mel-frequency cepstral coefficients (MFCC) and perceptual linear prediction cepstral coefficients (PLPCC) while the considered prosodic variables are loudness, voice-intensity and voice-probability. An analysis presented in this paper shows that, given that the textual content remains the same, the considered prosodic variables exhibit very similar contours for adults' and children's speech. At the same time, the contours differ a lot when the context is different. Consequently, inclusion of prosodic information reduces the inter-speaker differences and increases the class discrimination. This subsequently improves the recognition performance. Further improvements are obtained by projecting the feature vectors obtained by combining the two features to a lower-dimensional subspace. The same has been experimentally verified in this study for mismatched speech recognition using deep neural network (DNN) based system. On combining MFCC (PLPCC) and prosodic features, a relative improvement of 16% (14%) is noted on decoding children's speech using adult data trained DNN models.</abstract><pub>IEEE</pub><doi>10.1109/ICASSP.2018.8461668</doi><tpages>5</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | EISSN: 2379-190X |
ispartof | 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, p.5519-5523 |
issn | 2379-190X |
language | eng |
recordid | cdi_ieee_primary_8461668 |
source | IEEE Xplore All Conference Series |
subjects | acoustic mismatch Children's ASR Feature extraction feature projection Hidden Markov models Mel frequency cepstral coefficient prosodic variables Speech recognition Task analysis |
title | Role of Prosodic Features on Children's Speech Recognition |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T12%3A05%3A42IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Role%20of%20Prosodic%20Features%20on%20Children's%20Speech%20Recognition&rft.btitle=2018%20IEEE%20International%20Conference%20on%20Acoustics,%20Speech%20and%20Signal%20Processing%20(ICASSP)&rft.au=Kathania,%20Hemant%20K.&rft.date=2018-04&rft.spage=5519&rft.epage=5523&rft.pages=5519-5523&rft.eissn=2379-190X&rft_id=info:doi/10.1109/ICASSP.2018.8461668&rft.eisbn=9781538646588&rft.eisbn_list=1538646587&rft_dat=%3Cieee_CHZPO%3E8461668%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i175t-552a361abcbcca33bbdd1383539ddcc01bf7542492822a02c568510bd87b058c3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=8461668&rfr_iscdi=true |