Loading…

A computer-aided speech analytics approach for pronunciation feedback using deep feature clustering

Nowadays, the demand for language learning is increasing because people need to communicate with other people belonging to different regions for their business deals, study, etc. During language learning, a lot of pronunciation mistakes occur due to unfamiliarity with a new language and differences...

Full description

Saved in:
Bibliographic Details
Published in:Multimedia systems 2023-06, Vol.29 (3), p.1699-1715
Main Authors: Nazir, Faria, Majeed, Muhammad Nadeem, Ghazanfar, Mustansar Ali, Maqsood, Muazzam
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c319t-35f2d857c5e99dc571673d9da1e6f4815b730b6e90f50c6e452291296dc7f1053
cites cdi_FETCH-LOGICAL-c319t-35f2d857c5e99dc571673d9da1e6f4815b730b6e90f50c6e452291296dc7f1053
container_end_page 1715
container_issue 3
container_start_page 1699
container_title Multimedia systems
container_volume 29
creator Nazir, Faria
Majeed, Muhammad Nadeem
Ghazanfar, Mustansar Ali
Maqsood, Muazzam
description Nowadays, the demand for language learning is increasing because people need to communicate with other people belonging to different regions for their business deals, study, etc. During language learning, a lot of pronunciation mistakes occur due to unfamiliarity with a new language and differences in accent. In this paper, we perform speech mistakes analysis using deep feature-based clustering. We proposed two novel methods for speech analysis, one to deal with phonemic errors (confusing phonemes) and the other to deal with the prosodic errors (partially changed pronunciation variation of phones). For accurate and efficient language learning, it is important to learn both phonemic as well as prosodic error corrections. In our first method, we perform speech analysis by combining deep CNN features and clustering algorithm to detect the phonemic errors. We classify the phonemes using K-nearest neighbor, Naïve Bayes, and support vector machine (SVM). We perform experiments on the six most frequently mispronounced confusing pairs of Arabic to handle phonemic errors and achieve an accuracy of 94%. In our second method, we proposed the unsupervised phone variation model (PVM) to detect prosodic errors. In PVM, each phone is extended to represent the different types of pronunciation variation of that phone with different proficiency levels. We use an Arabic dataset of 28 individual phones for speech analysis and provide feedback based on the variation of each phone and achieves an accuracy of 97%.
doi_str_mv 10.1007/s00530-021-00822-5
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2821009408</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2821009408</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-35f2d857c5e99dc571673d9da1e6f4815b730b6e90f50c6e452291296dc7f1053</originalsourceid><addsrcrecordid>eNp9kM1OwzAQhC0EEqXwApwscTas7TiJj1XFn1SJC5wtx15DSpsEOzn07XEJEjdOuxrNjnY-Qq453HKA6i4BKAkMBGcAtRBMnZAFL6RgvK7FKVmALgQrdCnOyUVKWwBelRIWxK2o6_fDNGJktvXoaRoQ3Qe1nd0dxtYlaoch9jZLoY80r93UudaObd_RgOgb6z7plNrunXrEIWt2nCJSt5tSTs36JTkLdpfw6ncuydvD_ev6iW1eHp_Xqw1zkuuRSRWEr1XlFGrtnap4WUmvveVYhqLmqqkkNCVqCApciYUSQnOhS--qwHP_JbmZc_OTXxOm0Wz7KeYeyYhaZE66gDq7xOxysU8pYjBDbPc2HgwHc4RpZpgmwzQ_MM0xWs5HaTg2wvgX_c_VN4GJd8c</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2821009408</pqid></control><display><type>article</type><title>A computer-aided speech analytics approach for pronunciation feedback using deep feature clustering</title><source>Springer Nature</source><creator>Nazir, Faria ; Majeed, Muhammad Nadeem ; Ghazanfar, Mustansar Ali ; Maqsood, Muazzam</creator><creatorcontrib>Nazir, Faria ; Majeed, Muhammad Nadeem ; Ghazanfar, Mustansar Ali ; Maqsood, Muazzam</creatorcontrib><description>Nowadays, the demand for language learning is increasing because people need to communicate with other people belonging to different regions for their business deals, study, etc. During language learning, a lot of pronunciation mistakes occur due to unfamiliarity with a new language and differences in accent. In this paper, we perform speech mistakes analysis using deep feature-based clustering. We proposed two novel methods for speech analysis, one to deal with phonemic errors (confusing phonemes) and the other to deal with the prosodic errors (partially changed pronunciation variation of phones). For accurate and efficient language learning, it is important to learn both phonemic as well as prosodic error corrections. In our first method, we perform speech analysis by combining deep CNN features and clustering algorithm to detect the phonemic errors. We classify the phonemes using K-nearest neighbor, Naïve Bayes, and support vector machine (SVM). We perform experiments on the six most frequently mispronounced confusing pairs of Arabic to handle phonemic errors and achieve an accuracy of 94%. In our second method, we proposed the unsupervised phone variation model (PVM) to detect prosodic errors. In PVM, each phone is extended to represent the different types of pronunciation variation of that phone with different proficiency levels. We use an Arabic dataset of 28 individual phones for speech analysis and provide feedback based on the variation of each phone and achieves an accuracy of 97%.</description><identifier>ISSN: 0942-4962</identifier><identifier>EISSN: 1432-1882</identifier><identifier>DOI: 10.1007/s00530-021-00822-5</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Algorithms ; Clustering ; Computer Communication Networks ; Computer Graphics ; Computer Science ; Cryptology ; Data Storage Representation ; Error correction ; Feedback ; Learning ; Linguistics ; Multimedia Information Systems ; Operating Systems ; Phonemes ; Role of Deep Learning Models &amp; Analytics in Industrial Multimedia Environment ; Special Issue Paper ; Speech ; Support vector machines</subject><ispartof>Multimedia systems, 2023-06, Vol.29 (3), p.1699-1715</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021</rights><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2021.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-35f2d857c5e99dc571673d9da1e6f4815b730b6e90f50c6e452291296dc7f1053</citedby><cites>FETCH-LOGICAL-c319t-35f2d857c5e99dc571673d9da1e6f4815b730b6e90f50c6e452291296dc7f1053</cites><orcidid>0000-0002-2709-0849</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Nazir, Faria</creatorcontrib><creatorcontrib>Majeed, Muhammad Nadeem</creatorcontrib><creatorcontrib>Ghazanfar, Mustansar Ali</creatorcontrib><creatorcontrib>Maqsood, Muazzam</creatorcontrib><title>A computer-aided speech analytics approach for pronunciation feedback using deep feature clustering</title><title>Multimedia systems</title><addtitle>Multimedia Systems</addtitle><description>Nowadays, the demand for language learning is increasing because people need to communicate with other people belonging to different regions for their business deals, study, etc. During language learning, a lot of pronunciation mistakes occur due to unfamiliarity with a new language and differences in accent. In this paper, we perform speech mistakes analysis using deep feature-based clustering. We proposed two novel methods for speech analysis, one to deal with phonemic errors (confusing phonemes) and the other to deal with the prosodic errors (partially changed pronunciation variation of phones). For accurate and efficient language learning, it is important to learn both phonemic as well as prosodic error corrections. In our first method, we perform speech analysis by combining deep CNN features and clustering algorithm to detect the phonemic errors. We classify the phonemes using K-nearest neighbor, Naïve Bayes, and support vector machine (SVM). We perform experiments on the six most frequently mispronounced confusing pairs of Arabic to handle phonemic errors and achieve an accuracy of 94%. In our second method, we proposed the unsupervised phone variation model (PVM) to detect prosodic errors. In PVM, each phone is extended to represent the different types of pronunciation variation of that phone with different proficiency levels. We use an Arabic dataset of 28 individual phones for speech analysis and provide feedback based on the variation of each phone and achieves an accuracy of 97%.</description><subject>Algorithms</subject><subject>Clustering</subject><subject>Computer Communication Networks</subject><subject>Computer Graphics</subject><subject>Computer Science</subject><subject>Cryptology</subject><subject>Data Storage Representation</subject><subject>Error correction</subject><subject>Feedback</subject><subject>Learning</subject><subject>Linguistics</subject><subject>Multimedia Information Systems</subject><subject>Operating Systems</subject><subject>Phonemes</subject><subject>Role of Deep Learning Models &amp; Analytics in Industrial Multimedia Environment</subject><subject>Special Issue Paper</subject><subject>Speech</subject><subject>Support vector machines</subject><issn>0942-4962</issn><issn>1432-1882</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNp9kM1OwzAQhC0EEqXwApwscTas7TiJj1XFn1SJC5wtx15DSpsEOzn07XEJEjdOuxrNjnY-Qq453HKA6i4BKAkMBGcAtRBMnZAFL6RgvK7FKVmALgQrdCnOyUVKWwBelRIWxK2o6_fDNGJktvXoaRoQ3Qe1nd0dxtYlaoch9jZLoY80r93UudaObd_RgOgb6z7plNrunXrEIWt2nCJSt5tSTs36JTkLdpfw6ncuydvD_ev6iW1eHp_Xqw1zkuuRSRWEr1XlFGrtnap4WUmvveVYhqLmqqkkNCVqCApciYUSQnOhS--qwHP_JbmZc_OTXxOm0Wz7KeYeyYhaZE66gDq7xOxysU8pYjBDbPc2HgwHc4RpZpgmwzQ_MM0xWs5HaTg2wvgX_c_VN4GJd8c</recordid><startdate>20230601</startdate><enddate>20230601</enddate><creator>Nazir, Faria</creator><creator>Majeed, Muhammad Nadeem</creator><creator>Ghazanfar, Mustansar Ali</creator><creator>Maqsood, Muazzam</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-2709-0849</orcidid></search><sort><creationdate>20230601</creationdate><title>A computer-aided speech analytics approach for pronunciation feedback using deep feature clustering</title><author>Nazir, Faria ; Majeed, Muhammad Nadeem ; Ghazanfar, Mustansar Ali ; Maqsood, Muazzam</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-35f2d857c5e99dc571673d9da1e6f4815b730b6e90f50c6e452291296dc7f1053</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Clustering</topic><topic>Computer Communication Networks</topic><topic>Computer Graphics</topic><topic>Computer Science</topic><topic>Cryptology</topic><topic>Data Storage Representation</topic><topic>Error correction</topic><topic>Feedback</topic><topic>Learning</topic><topic>Linguistics</topic><topic>Multimedia Information Systems</topic><topic>Operating Systems</topic><topic>Phonemes</topic><topic>Role of Deep Learning Models &amp; Analytics in Industrial Multimedia Environment</topic><topic>Special Issue Paper</topic><topic>Speech</topic><topic>Support vector machines</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Nazir, Faria</creatorcontrib><creatorcontrib>Majeed, Muhammad Nadeem</creatorcontrib><creatorcontrib>Ghazanfar, Mustansar Ali</creatorcontrib><creatorcontrib>Maqsood, Muazzam</creatorcontrib><collection>CrossRef</collection><jtitle>Multimedia systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Nazir, Faria</au><au>Majeed, Muhammad Nadeem</au><au>Ghazanfar, Mustansar Ali</au><au>Maqsood, Muazzam</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A computer-aided speech analytics approach for pronunciation feedback using deep feature clustering</atitle><jtitle>Multimedia systems</jtitle><stitle>Multimedia Systems</stitle><date>2023-06-01</date><risdate>2023</risdate><volume>29</volume><issue>3</issue><spage>1699</spage><epage>1715</epage><pages>1699-1715</pages><issn>0942-4962</issn><eissn>1432-1882</eissn><abstract>Nowadays, the demand for language learning is increasing because people need to communicate with other people belonging to different regions for their business deals, study, etc. During language learning, a lot of pronunciation mistakes occur due to unfamiliarity with a new language and differences in accent. In this paper, we perform speech mistakes analysis using deep feature-based clustering. We proposed two novel methods for speech analysis, one to deal with phonemic errors (confusing phonemes) and the other to deal with the prosodic errors (partially changed pronunciation variation of phones). For accurate and efficient language learning, it is important to learn both phonemic as well as prosodic error corrections. In our first method, we perform speech analysis by combining deep CNN features and clustering algorithm to detect the phonemic errors. We classify the phonemes using K-nearest neighbor, Naïve Bayes, and support vector machine (SVM). We perform experiments on the six most frequently mispronounced confusing pairs of Arabic to handle phonemic errors and achieve an accuracy of 94%. In our second method, we proposed the unsupervised phone variation model (PVM) to detect prosodic errors. In PVM, each phone is extended to represent the different types of pronunciation variation of that phone with different proficiency levels. We use an Arabic dataset of 28 individual phones for speech analysis and provide feedback based on the variation of each phone and achieves an accuracy of 97%.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s00530-021-00822-5</doi><tpages>17</tpages><orcidid>https://orcid.org/0000-0002-2709-0849</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0942-4962
ispartof Multimedia systems, 2023-06, Vol.29 (3), p.1699-1715
issn 0942-4962
1432-1882
language eng
recordid cdi_proquest_journals_2821009408
source Springer Nature
subjects Algorithms
Clustering
Computer Communication Networks
Computer Graphics
Computer Science
Cryptology
Data Storage Representation
Error correction
Feedback
Learning
Linguistics
Multimedia Information Systems
Operating Systems
Phonemes
Role of Deep Learning Models & Analytics in Industrial Multimedia Environment
Special Issue Paper
Speech
Support vector machines
title A computer-aided speech analytics approach for pronunciation feedback using deep feature clustering
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-30T22%3A33%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20computer-aided%20speech%20analytics%20approach%20for%20pronunciation%20feedback%20using%20deep%20feature%20clustering&rft.jtitle=Multimedia%20systems&rft.au=Nazir,%20Faria&rft.date=2023-06-01&rft.volume=29&rft.issue=3&rft.spage=1699&rft.epage=1715&rft.pages=1699-1715&rft.issn=0942-4962&rft.eissn=1432-1882&rft_id=info:doi/10.1007/s00530-021-00822-5&rft_dat=%3Cproquest_cross%3E2821009408%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c319t-35f2d857c5e99dc571673d9da1e6f4815b730b6e90f50c6e452291296dc7f1053%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2821009408&rft_id=info:pmid/&rfr_iscdi=true