Loading…
A computer-aided speech analytics approach for pronunciation feedback using deep feature clustering
Nowadays, the demand for language learning is increasing because people need to communicate with other people belonging to different regions for their business deals, study, etc. During language learning, a lot of pronunciation mistakes occur due to unfamiliarity with a new language and differences...
Saved in:
Published in: | Multimedia systems 2023-06, Vol.29 (3), p.1699-1715 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c319t-35f2d857c5e99dc571673d9da1e6f4815b730b6e90f50c6e452291296dc7f1053 |
---|---|
cites | cdi_FETCH-LOGICAL-c319t-35f2d857c5e99dc571673d9da1e6f4815b730b6e90f50c6e452291296dc7f1053 |
container_end_page | 1715 |
container_issue | 3 |
container_start_page | 1699 |
container_title | Multimedia systems |
container_volume | 29 |
creator | Nazir, Faria Majeed, Muhammad Nadeem Ghazanfar, Mustansar Ali Maqsood, Muazzam |
description | Nowadays, the demand for language learning is increasing because people need to communicate with other people belonging to different regions for their business deals, study, etc. During language learning, a lot of pronunciation mistakes occur due to unfamiliarity with a new language and differences in accent. In this paper, we perform speech mistakes analysis using deep feature-based clustering. We proposed two novel methods for speech analysis, one to deal with phonemic errors (confusing phonemes) and the other to deal with the prosodic errors (partially changed pronunciation variation of phones). For accurate and efficient language learning, it is important to learn both phonemic as well as prosodic error corrections. In our first method, we perform speech analysis by combining deep CNN features and clustering algorithm to detect the phonemic errors. We classify the phonemes using K-nearest neighbor, Naïve Bayes, and support vector machine (SVM). We perform experiments on the six most frequently mispronounced confusing pairs of Arabic to handle phonemic errors and achieve an accuracy of 94%. In our second method, we proposed the unsupervised phone variation model (PVM) to detect prosodic errors. In PVM, each phone is extended to represent the different types of pronunciation variation of that phone with different proficiency levels. We use an Arabic dataset of 28 individual phones for speech analysis and provide feedback based on the variation of each phone and achieves an accuracy of 97%. |
doi_str_mv | 10.1007/s00530-021-00822-5 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2821009408</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2821009408</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-35f2d857c5e99dc571673d9da1e6f4815b730b6e90f50c6e452291296dc7f1053</originalsourceid><addsrcrecordid>eNp9kM1OwzAQhC0EEqXwApwscTas7TiJj1XFn1SJC5wtx15DSpsEOzn07XEJEjdOuxrNjnY-Qq453HKA6i4BKAkMBGcAtRBMnZAFL6RgvK7FKVmALgQrdCnOyUVKWwBelRIWxK2o6_fDNGJktvXoaRoQ3Qe1nd0dxtYlaoch9jZLoY80r93UudaObd_RgOgb6z7plNrunXrEIWt2nCJSt5tSTs36JTkLdpfw6ncuydvD_ev6iW1eHp_Xqw1zkuuRSRWEr1XlFGrtnap4WUmvveVYhqLmqqkkNCVqCApciYUSQnOhS--qwHP_JbmZc_OTXxOm0Wz7KeYeyYhaZE66gDq7xOxysU8pYjBDbPc2HgwHc4RpZpgmwzQ_MM0xWs5HaTg2wvgX_c_VN4GJd8c</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2821009408</pqid></control><display><type>article</type><title>A computer-aided speech analytics approach for pronunciation feedback using deep feature clustering</title><source>Springer Nature</source><creator>Nazir, Faria ; Majeed, Muhammad Nadeem ; Ghazanfar, Mustansar Ali ; Maqsood, Muazzam</creator><creatorcontrib>Nazir, Faria ; Majeed, Muhammad Nadeem ; Ghazanfar, Mustansar Ali ; Maqsood, Muazzam</creatorcontrib><description>Nowadays, the demand for language learning is increasing because people need to communicate with other people belonging to different regions for their business deals, study, etc. During language learning, a lot of pronunciation mistakes occur due to unfamiliarity with a new language and differences in accent. In this paper, we perform speech mistakes analysis using deep feature-based clustering. We proposed two novel methods for speech analysis, one to deal with phonemic errors (confusing phonemes) and the other to deal with the prosodic errors (partially changed pronunciation variation of phones). For accurate and efficient language learning, it is important to learn both phonemic as well as prosodic error corrections. In our first method, we perform speech analysis by combining deep CNN features and clustering algorithm to detect the phonemic errors. We classify the phonemes using K-nearest neighbor, Naïve Bayes, and support vector machine (SVM). We perform experiments on the six most frequently mispronounced confusing pairs of Arabic to handle phonemic errors and achieve an accuracy of 94%. In our second method, we proposed the unsupervised phone variation model (PVM) to detect prosodic errors. In PVM, each phone is extended to represent the different types of pronunciation variation of that phone with different proficiency levels. We use an Arabic dataset of 28 individual phones for speech analysis and provide feedback based on the variation of each phone and achieves an accuracy of 97%.</description><identifier>ISSN: 0942-4962</identifier><identifier>EISSN: 1432-1882</identifier><identifier>DOI: 10.1007/s00530-021-00822-5</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Algorithms ; Clustering ; Computer Communication Networks ; Computer Graphics ; Computer Science ; Cryptology ; Data Storage Representation ; Error correction ; Feedback ; Learning ; Linguistics ; Multimedia Information Systems ; Operating Systems ; Phonemes ; Role of Deep Learning Models & Analytics in Industrial Multimedia Environment ; Special Issue Paper ; Speech ; Support vector machines</subject><ispartof>Multimedia systems, 2023-06, Vol.29 (3), p.1699-1715</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021</rights><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-35f2d857c5e99dc571673d9da1e6f4815b730b6e90f50c6e452291296dc7f1053</citedby><cites>FETCH-LOGICAL-c319t-35f2d857c5e99dc571673d9da1e6f4815b730b6e90f50c6e452291296dc7f1053</cites><orcidid>0000-0002-2709-0849</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Nazir, Faria</creatorcontrib><creatorcontrib>Majeed, Muhammad Nadeem</creatorcontrib><creatorcontrib>Ghazanfar, Mustansar Ali</creatorcontrib><creatorcontrib>Maqsood, Muazzam</creatorcontrib><title>A computer-aided speech analytics approach for pronunciation feedback using deep feature clustering</title><title>Multimedia systems</title><addtitle>Multimedia Systems</addtitle><description>Nowadays, the demand for language learning is increasing because people need to communicate with other people belonging to different regions for their business deals, study, etc. During language learning, a lot of pronunciation mistakes occur due to unfamiliarity with a new language and differences in accent. In this paper, we perform speech mistakes analysis using deep feature-based clustering. We proposed two novel methods for speech analysis, one to deal with phonemic errors (confusing phonemes) and the other to deal with the prosodic errors (partially changed pronunciation variation of phones). For accurate and efficient language learning, it is important to learn both phonemic as well as prosodic error corrections. In our first method, we perform speech analysis by combining deep CNN features and clustering algorithm to detect the phonemic errors. We classify the phonemes using K-nearest neighbor, Naïve Bayes, and support vector machine (SVM). We perform experiments on the six most frequently mispronounced confusing pairs of Arabic to handle phonemic errors and achieve an accuracy of 94%. In our second method, we proposed the unsupervised phone variation model (PVM) to detect prosodic errors. In PVM, each phone is extended to represent the different types of pronunciation variation of that phone with different proficiency levels. We use an Arabic dataset of 28 individual phones for speech analysis and provide feedback based on the variation of each phone and achieves an accuracy of 97%.</description><subject>Algorithms</subject><subject>Clustering</subject><subject>Computer Communication Networks</subject><subject>Computer Graphics</subject><subject>Computer Science</subject><subject>Cryptology</subject><subject>Data Storage Representation</subject><subject>Error correction</subject><subject>Feedback</subject><subject>Learning</subject><subject>Linguistics</subject><subject>Multimedia Information Systems</subject><subject>Operating Systems</subject><subject>Phonemes</subject><subject>Role of Deep Learning Models & Analytics in Industrial Multimedia Environment</subject><subject>Special Issue Paper</subject><subject>Speech</subject><subject>Support vector machines</subject><issn>0942-4962</issn><issn>1432-1882</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNp9kM1OwzAQhC0EEqXwApwscTas7TiJj1XFn1SJC5wtx15DSpsEOzn07XEJEjdOuxrNjnY-Qq453HKA6i4BKAkMBGcAtRBMnZAFL6RgvK7FKVmALgQrdCnOyUVKWwBelRIWxK2o6_fDNGJktvXoaRoQ3Qe1nd0dxtYlaoch9jZLoY80r93UudaObd_RgOgb6z7plNrunXrEIWt2nCJSt5tSTs36JTkLdpfw6ncuydvD_ev6iW1eHp_Xqw1zkuuRSRWEr1XlFGrtnap4WUmvveVYhqLmqqkkNCVqCApciYUSQnOhS--qwHP_JbmZc_OTXxOm0Wz7KeYeyYhaZE66gDq7xOxysU8pYjBDbPc2HgwHc4RpZpgmwzQ_MM0xWs5HaTg2wvgX_c_VN4GJd8c</recordid><startdate>20230601</startdate><enddate>20230601</enddate><creator>Nazir, Faria</creator><creator>Majeed, Muhammad Nadeem</creator><creator>Ghazanfar, Mustansar Ali</creator><creator>Maqsood, Muazzam</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-2709-0849</orcidid></search><sort><creationdate>20230601</creationdate><title>A computer-aided speech analytics approach for pronunciation feedback using deep feature clustering</title><author>Nazir, Faria ; Majeed, Muhammad Nadeem ; Ghazanfar, Mustansar Ali ; Maqsood, Muazzam</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-35f2d857c5e99dc571673d9da1e6f4815b730b6e90f50c6e452291296dc7f1053</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Clustering</topic><topic>Computer Communication Networks</topic><topic>Computer Graphics</topic><topic>Computer Science</topic><topic>Cryptology</topic><topic>Data Storage Representation</topic><topic>Error correction</topic><topic>Feedback</topic><topic>Learning</topic><topic>Linguistics</topic><topic>Multimedia Information Systems</topic><topic>Operating Systems</topic><topic>Phonemes</topic><topic>Role of Deep Learning Models & Analytics in Industrial Multimedia Environment</topic><topic>Special Issue Paper</topic><topic>Speech</topic><topic>Support vector machines</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Nazir, Faria</creatorcontrib><creatorcontrib>Majeed, Muhammad Nadeem</creatorcontrib><creatorcontrib>Ghazanfar, Mustansar Ali</creatorcontrib><creatorcontrib>Maqsood, Muazzam</creatorcontrib><collection>CrossRef</collection><jtitle>Multimedia systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Nazir, Faria</au><au>Majeed, Muhammad Nadeem</au><au>Ghazanfar, Mustansar Ali</au><au>Maqsood, Muazzam</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A computer-aided speech analytics approach for pronunciation feedback using deep feature clustering</atitle><jtitle>Multimedia systems</jtitle><stitle>Multimedia Systems</stitle><date>2023-06-01</date><risdate>2023</risdate><volume>29</volume><issue>3</issue><spage>1699</spage><epage>1715</epage><pages>1699-1715</pages><issn>0942-4962</issn><eissn>1432-1882</eissn><abstract>Nowadays, the demand for language learning is increasing because people need to communicate with other people belonging to different regions for their business deals, study, etc. During language learning, a lot of pronunciation mistakes occur due to unfamiliarity with a new language and differences in accent. In this paper, we perform speech mistakes analysis using deep feature-based clustering. We proposed two novel methods for speech analysis, one to deal with phonemic errors (confusing phonemes) and the other to deal with the prosodic errors (partially changed pronunciation variation of phones). For accurate and efficient language learning, it is important to learn both phonemic as well as prosodic error corrections. In our first method, we perform speech analysis by combining deep CNN features and clustering algorithm to detect the phonemic errors. We classify the phonemes using K-nearest neighbor, Naïve Bayes, and support vector machine (SVM). We perform experiments on the six most frequently mispronounced confusing pairs of Arabic to handle phonemic errors and achieve an accuracy of 94%. In our second method, we proposed the unsupervised phone variation model (PVM) to detect prosodic errors. In PVM, each phone is extended to represent the different types of pronunciation variation of that phone with different proficiency levels. We use an Arabic dataset of 28 individual phones for speech analysis and provide feedback based on the variation of each phone and achieves an accuracy of 97%.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s00530-021-00822-5</doi><tpages>17</tpages><orcidid>https://orcid.org/0000-0002-2709-0849</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0942-4962 |
ispartof | Multimedia systems, 2023-06, Vol.29 (3), p.1699-1715 |
issn | 0942-4962 1432-1882 |
language | eng |
recordid | cdi_proquest_journals_2821009408 |
source | Springer Nature |
subjects | Algorithms Clustering Computer Communication Networks Computer Graphics Computer Science Cryptology Data Storage Representation Error correction Feedback Learning Linguistics Multimedia Information Systems Operating Systems Phonemes Role of Deep Learning Models & Analytics in Industrial Multimedia Environment Special Issue Paper Speech Support vector machines |
title | A computer-aided speech analytics approach for pronunciation feedback using deep feature clustering |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-30T22%3A33%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20computer-aided%20speech%20analytics%20approach%20for%20pronunciation%20feedback%20using%20deep%20feature%20clustering&rft.jtitle=Multimedia%20systems&rft.au=Nazir,%20Faria&rft.date=2023-06-01&rft.volume=29&rft.issue=3&rft.spage=1699&rft.epage=1715&rft.pages=1699-1715&rft.issn=0942-4962&rft.eissn=1432-1882&rft_id=info:doi/10.1007/s00530-021-00822-5&rft_dat=%3Cproquest_cross%3E2821009408%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c319t-35f2d857c5e99dc571673d9da1e6f4815b730b6e90f50c6e452291296dc7f1053%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2821009408&rft_id=info:pmid/&rfr_iscdi=true |