Loading…
Fuzzy Phoneme Classification Using Multi-speaker Vocal Tract Length Normalization
The overall success of automatic speech recognition (ASR) depends on efficient phoneme recognition performance and quality of speech signal received in ASR. However, dissimilar inputs of speakers affect the overall recognition performance. One of the main problems that affect recognition performance...
Saved in:
Published in: | Technical review - IETE 2014-03, Vol.31 (2), p.128-136 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c307t-53dbd93eb3e136f17f3a86e8ff527586092672a039672a9b5925e454036e2b7b3 |
---|---|
cites | cdi_FETCH-LOGICAL-c307t-53dbd93eb3e136f17f3a86e8ff527586092672a039672a9b5925e454036e2b7b3 |
container_end_page | 136 |
container_issue | 2 |
container_start_page | 128 |
container_title | Technical review - IETE |
container_volume | 31 |
creator | Lung, Jensen Wong Jing Salam, Md. Sah Hj Rehman, Amjad Rahim, Mohd Shafry Mohd Saba, Tanzila |
description | The overall success of automatic speech recognition (ASR) depends on efficient phoneme recognition performance and quality of speech signal received in ASR. However, dissimilar inputs of speakers affect the overall recognition performance. One of the main problems that affect recognition performance is inter-speaker variability. Vocal tract length normalization (VTLN) is introduced to compensate inter-speaker variation on the speaker signal by applying speaker-specific warping of the frequency scale of a filter bank. Instead of measuring the performance on word level with speaker-specific warping, this research focuses on direct tackling at the phoneme level and applying VTLN on all speakers' speech signals to analyse the best setting for the highest recognition performance. This research seeks to compare each phoneme recognition results from warping factor between 0.74 and 1.54 with 0.02 increments on nine different ranges of frequency warping boundary. The warp factor and frequency warping range that provides the highest phoneme recognition performance is applied on word recognition. The results show an improved performance in phoneme recognition by 0.7% and spoken word recognition by 0.5% using warp factor of 1.40 on frequency range of 300-5000 Hz in comparison to baseline results. |
doi_str_mv | 10.1080/02564602.2014.892669 |
format | article |
fullrecord | <record><control><sourceid>crossref_infor</sourceid><recordid>TN_cdi_crossref_primary_10_1080_02564602_2014_892669</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_1080_02564602_2014_892669</sourcerecordid><originalsourceid>FETCH-LOGICAL-c307t-53dbd93eb3e136f17f3a86e8ff527586092672a039672a9b5925e454036e2b7b3</originalsourceid><addsrcrecordid>eNp9kMtOAjEYhRujiYi-gYu-wGDvna6MISImeEvAbdMZWqiWKWmHGHh6Z0S3rs5ZnHPy_x8A1xiNMCrRDSJcMIHIiCDMRqUiQqgTMEBKsoIriU8730WKPnMOLnL-QEgwwvEAvE12h8Mevq5jYzcWjoPJ2Ttfm9bHBi6yb1bwaRdaX-StNZ82wfdYmwDnydQtnNlm1a7hc0wbE_zhp3QJzpwJ2V796hAsJvfz8bSYvTw8ju9mRU2RbAtOl9VSUVtRi6lwWDpqSmFL5ziRvBSo-0ISg6jqRVVcEW4ZZ4gKSypZ0SFgx906xZyTdXqb_MakvcZI91j0HxbdY9FHLF3t9ljzjevP_oopLHVr9iEml0xT-6zpvwvfJA9pPA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Fuzzy Phoneme Classification Using Multi-speaker Vocal Tract Length Normalization</title><source>Taylor and Francis Science and Technology Collection</source><creator>Lung, Jensen Wong Jing ; Salam, Md. Sah Hj ; Rehman, Amjad ; Rahim, Mohd Shafry Mohd ; Saba, Tanzila</creator><creatorcontrib>Lung, Jensen Wong Jing ; Salam, Md. Sah Hj ; Rehman, Amjad ; Rahim, Mohd Shafry Mohd ; Saba, Tanzila</creatorcontrib><description>The overall success of automatic speech recognition (ASR) depends on efficient phoneme recognition performance and quality of speech signal received in ASR. However, dissimilar inputs of speakers affect the overall recognition performance. One of the main problems that affect recognition performance is inter-speaker variability. Vocal tract length normalization (VTLN) is introduced to compensate inter-speaker variation on the speaker signal by applying speaker-specific warping of the frequency scale of a filter bank. Instead of measuring the performance on word level with speaker-specific warping, this research focuses on direct tackling at the phoneme level and applying VTLN on all speakers' speech signals to analyse the best setting for the highest recognition performance. This research seeks to compare each phoneme recognition results from warping factor between 0.74 and 1.54 with 0.02 increments on nine different ranges of frequency warping boundary. The warp factor and frequency warping range that provides the highest phoneme recognition performance is applied on word recognition. The results show an improved performance in phoneme recognition by 0.7% and spoken word recognition by 0.5% using warp factor of 1.40 on frequency range of 300-5000 Hz in comparison to baseline results.</description><identifier>ISSN: 0256-4602</identifier><identifier>EISSN: 0974-5971</identifier><identifier>DOI: 10.1080/02564602.2014.892669</identifier><language>eng</language><publisher>Taylor & Francis</publisher><subject>Fuzzy phoneme recognition ; Inter-speaker variability ; Multi-speaker frequency warping ; Vocal tract length normalization ; Warp factor</subject><ispartof>Technical review - IETE, 2014-03, Vol.31 (2), p.128-136</ispartof><rights>2014 Taylor & Francis 2014</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c307t-53dbd93eb3e136f17f3a86e8ff527586092672a039672a9b5925e454036e2b7b3</citedby><cites>FETCH-LOGICAL-c307t-53dbd93eb3e136f17f3a86e8ff527586092672a039672a9b5925e454036e2b7b3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27903,27904</link.rule.ids></links><search><creatorcontrib>Lung, Jensen Wong Jing</creatorcontrib><creatorcontrib>Salam, Md. Sah Hj</creatorcontrib><creatorcontrib>Rehman, Amjad</creatorcontrib><creatorcontrib>Rahim, Mohd Shafry Mohd</creatorcontrib><creatorcontrib>Saba, Tanzila</creatorcontrib><title>Fuzzy Phoneme Classification Using Multi-speaker Vocal Tract Length Normalization</title><title>Technical review - IETE</title><description>The overall success of automatic speech recognition (ASR) depends on efficient phoneme recognition performance and quality of speech signal received in ASR. However, dissimilar inputs of speakers affect the overall recognition performance. One of the main problems that affect recognition performance is inter-speaker variability. Vocal tract length normalization (VTLN) is introduced to compensate inter-speaker variation on the speaker signal by applying speaker-specific warping of the frequency scale of a filter bank. Instead of measuring the performance on word level with speaker-specific warping, this research focuses on direct tackling at the phoneme level and applying VTLN on all speakers' speech signals to analyse the best setting for the highest recognition performance. This research seeks to compare each phoneme recognition results from warping factor between 0.74 and 1.54 with 0.02 increments on nine different ranges of frequency warping boundary. The warp factor and frequency warping range that provides the highest phoneme recognition performance is applied on word recognition. The results show an improved performance in phoneme recognition by 0.7% and spoken word recognition by 0.5% using warp factor of 1.40 on frequency range of 300-5000 Hz in comparison to baseline results.</description><subject>Fuzzy phoneme recognition</subject><subject>Inter-speaker variability</subject><subject>Multi-speaker frequency warping</subject><subject>Vocal tract length normalization</subject><subject>Warp factor</subject><issn>0256-4602</issn><issn>0974-5971</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><recordid>eNp9kMtOAjEYhRujiYi-gYu-wGDvna6MISImeEvAbdMZWqiWKWmHGHh6Z0S3rs5ZnHPy_x8A1xiNMCrRDSJcMIHIiCDMRqUiQqgTMEBKsoIriU8730WKPnMOLnL-QEgwwvEAvE12h8Mevq5jYzcWjoPJ2Ttfm9bHBi6yb1bwaRdaX-StNZ82wfdYmwDnydQtnNlm1a7hc0wbE_zhp3QJzpwJ2V796hAsJvfz8bSYvTw8ju9mRU2RbAtOl9VSUVtRi6lwWDpqSmFL5ziRvBSo-0ISg6jqRVVcEW4ZZ4gKSypZ0SFgx906xZyTdXqb_MakvcZI91j0HxbdY9FHLF3t9ljzjevP_oopLHVr9iEml0xT-6zpvwvfJA9pPA</recordid><startdate>20140304</startdate><enddate>20140304</enddate><creator>Lung, Jensen Wong Jing</creator><creator>Salam, Md. Sah Hj</creator><creator>Rehman, Amjad</creator><creator>Rahim, Mohd Shafry Mohd</creator><creator>Saba, Tanzila</creator><general>Taylor & Francis</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20140304</creationdate><title>Fuzzy Phoneme Classification Using Multi-speaker Vocal Tract Length Normalization</title><author>Lung, Jensen Wong Jing ; Salam, Md. Sah Hj ; Rehman, Amjad ; Rahim, Mohd Shafry Mohd ; Saba, Tanzila</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c307t-53dbd93eb3e136f17f3a86e8ff527586092672a039672a9b5925e454036e2b7b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Fuzzy phoneme recognition</topic><topic>Inter-speaker variability</topic><topic>Multi-speaker frequency warping</topic><topic>Vocal tract length normalization</topic><topic>Warp factor</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lung, Jensen Wong Jing</creatorcontrib><creatorcontrib>Salam, Md. Sah Hj</creatorcontrib><creatorcontrib>Rehman, Amjad</creatorcontrib><creatorcontrib>Rahim, Mohd Shafry Mohd</creatorcontrib><creatorcontrib>Saba, Tanzila</creatorcontrib><collection>CrossRef</collection><jtitle>Technical review - IETE</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lung, Jensen Wong Jing</au><au>Salam, Md. Sah Hj</au><au>Rehman, Amjad</au><au>Rahim, Mohd Shafry Mohd</au><au>Saba, Tanzila</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Fuzzy Phoneme Classification Using Multi-speaker Vocal Tract Length Normalization</atitle><jtitle>Technical review - IETE</jtitle><date>2014-03-04</date><risdate>2014</risdate><volume>31</volume><issue>2</issue><spage>128</spage><epage>136</epage><pages>128-136</pages><issn>0256-4602</issn><eissn>0974-5971</eissn><abstract>The overall success of automatic speech recognition (ASR) depends on efficient phoneme recognition performance and quality of speech signal received in ASR. However, dissimilar inputs of speakers affect the overall recognition performance. One of the main problems that affect recognition performance is inter-speaker variability. Vocal tract length normalization (VTLN) is introduced to compensate inter-speaker variation on the speaker signal by applying speaker-specific warping of the frequency scale of a filter bank. Instead of measuring the performance on word level with speaker-specific warping, this research focuses on direct tackling at the phoneme level and applying VTLN on all speakers' speech signals to analyse the best setting for the highest recognition performance. This research seeks to compare each phoneme recognition results from warping factor between 0.74 and 1.54 with 0.02 increments on nine different ranges of frequency warping boundary. The warp factor and frequency warping range that provides the highest phoneme recognition performance is applied on word recognition. The results show an improved performance in phoneme recognition by 0.7% and spoken word recognition by 0.5% using warp factor of 1.40 on frequency range of 300-5000 Hz in comparison to baseline results.</abstract><pub>Taylor & Francis</pub><doi>10.1080/02564602.2014.892669</doi><tpages>9</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0256-4602 |
ispartof | Technical review - IETE, 2014-03, Vol.31 (2), p.128-136 |
issn | 0256-4602 0974-5971 |
language | eng |
recordid | cdi_crossref_primary_10_1080_02564602_2014_892669 |
source | Taylor and Francis Science and Technology Collection |
subjects | Fuzzy phoneme recognition Inter-speaker variability Multi-speaker frequency warping Vocal tract length normalization Warp factor |
title | Fuzzy Phoneme Classification Using Multi-speaker Vocal Tract Length Normalization |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T06%3A04%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_infor&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Fuzzy%20Phoneme%20Classification%20Using%20Multi-speaker%20Vocal%20Tract%20Length%20Normalization&rft.jtitle=Technical%20review%20-%20IETE&rft.au=Lung,%20Jensen%20Wong%20Jing&rft.date=2014-03-04&rft.volume=31&rft.issue=2&rft.spage=128&rft.epage=136&rft.pages=128-136&rft.issn=0256-4602&rft.eissn=0974-5971&rft_id=info:doi/10.1080/02564602.2014.892669&rft_dat=%3Ccrossref_infor%3E10_1080_02564602_2014_892669%3C/crossref_infor%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c307t-53dbd93eb3e136f17f3a86e8ff527586092672a039672a9b5925e454036e2b7b3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |