Loading…

Learning pronunciation variation: A data-driven approach to rule-based lecxicon adaptation for automatic speech recognition

To achieve a robust system the variation seen for different speaking styles must be handled. An investigation of standard automatic speech recognition techniques for different speaking styles showed that lexical modelling using general-purpose variants gave small improvements, but the errors differe...

Full description

Saved in:
Bibliographic Details
Main Author: Amdal, Ingunn
Format: Dissertation
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Amdal, Ingunn
description To achieve a robust system the variation seen for different speaking styles must be handled. An investigation of standard automatic speech recognition techniques for different speaking styles showed that lexical modelling using general-purpose variants gave small improvements, but the errors differed compared with using only one canonical pronunciation per word. Modelling the variation using the acoustic models (using context dependency and/or speaker dependent adaptation) gave a significant improvement, but the resulting performance for non-native and spontaneous speech was still far from read speech. In this dissertation a complete data-driven approach to rule-based lexicon adaptation is presented, where the effect of the acoustic models is incorporated in the rule pruning metric. Reference and alternative transcriptions were aligned by dynamic programming, but with a data-driven method to derive the phone-to-phone substitution costs. The costs were based on the statistical co-occurrence of phones, association strength. Rules for pronunciation variation were derived from this alignment. The rules were pruned using a new metric based on acoustic log likelihood. Well trained acoustic models are capable of modelling much of the variation seen, and using the acoustic log likelihood to assess the pronunciation rules prevents the lexical modelling from adding variation already accounted for as shown for direct pronunciation variation modelling. For the non-native task data-driven pronunciation modelling by learning pronunciation rules gave a significant performance gain. Acoustic log likelihood rule pruning performed better than rule probability pruning. For spontaneous dictation the pronunciation variation experiments did not improve the performance. The answer to how to better model the variation for spontaneous speech seems to lie neither in the acoustical nor the lexical modelling. The main differences between read and spontaneous speech are the grammar used and disfluencies like restarts and long pauses. The language model may thus be the best starting point for more research to achieve better performance for this speaking style.
format dissertation
fullrecord <record><control><sourceid>cristin_3HK</sourceid><recordid>TN_cdi_cristin_nora_11250_249709</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>11250_249709</sourcerecordid><originalsourceid>FETCH-cristin_nora_11250_2497093</originalsourceid><addsrcrecordid>eNqNjDEKwkAQRdNYiHqHuUAgiYrETkSxsLQP4-4kDsTZZXYTBC_vSjyA1f8f3vvz7H0lVGHpwKuTQQxjZCcwok5tDwewGDG3yiMJoE8gmgdEBzr0lN8xkIWezItNEtGij9NH6xRwiO6ZpoHgiZKmZFwn_AWW2azFPtDql4sMzqfb8ZIb5RBZGnGKTVlW26KpNvWuqNd_IB9kb0dQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>dissertation</recordtype></control><display><type>dissertation</type><title>Learning pronunciation variation: A data-driven approach to rule-based lecxicon adaptation for automatic speech recognition</title><source>NORA - Norwegian Open Research Archives</source><creator>Amdal, Ingunn</creator><creatorcontrib>Amdal, Ingunn ; Norges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk ; Svendsen, Torbjørn</creatorcontrib><description>To achieve a robust system the variation seen for different speaking styles must be handled. An investigation of standard automatic speech recognition techniques for different speaking styles showed that lexical modelling using general-purpose variants gave small improvements, but the errors differed compared with using only one canonical pronunciation per word. Modelling the variation using the acoustic models (using context dependency and/or speaker dependent adaptation) gave a significant improvement, but the resulting performance for non-native and spontaneous speech was still far from read speech. In this dissertation a complete data-driven approach to rule-based lexicon adaptation is presented, where the effect of the acoustic models is incorporated in the rule pruning metric. Reference and alternative transcriptions were aligned by dynamic programming, but with a data-driven method to derive the phone-to-phone substitution costs. The costs were based on the statistical co-occurrence of phones, association strength. Rules for pronunciation variation were derived from this alignment. The rules were pruned using a new metric based on acoustic log likelihood. Well trained acoustic models are capable of modelling much of the variation seen, and using the acoustic log likelihood to assess the pronunciation rules prevents the lexical modelling from adding variation already accounted for as shown for direct pronunciation variation modelling. For the non-native task data-driven pronunciation modelling by learning pronunciation rules gave a significant performance gain. Acoustic log likelihood rule pruning performed better than rule probability pruning. For spontaneous dictation the pronunciation variation experiments did not improve the performance. The answer to how to better model the variation for spontaneous speech seems to lie neither in the acoustical nor the lexical modelling. The main differences between read and spontaneous speech are the grammar used and disfluencies like restarts and long pauses. The language model may thus be the best starting point for more research to achieve better performance for this speaking style.</description><language>eng</language><publisher>Fakultet for informasjonsteknologi, matematikk og elektroteknikk</publisher><subject>automatic speech recognition ; lexical modelling ; non-native speech ; pronunciation variation ; TECHNOLOGY: Electrical engineering, electronics and photonics: Electronics</subject><ispartof>Dr.ingeniøravhandling, 0809-103X, 2002</ispartof><rights>info:eu-repo/semantics/openAccess</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,311,777,882,4038,26548</link.rule.ids><linktorsrc>$$Uhttp://hdl.handle.net/11250/249709$$EView_record_in_NORA$$FView_record_in_$$GNORA$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Amdal, Ingunn</creatorcontrib><title>Learning pronunciation variation: A data-driven approach to rule-based lecxicon adaptation for automatic speech recognition</title><title>Dr.ingeniøravhandling, 0809-103X</title><description>To achieve a robust system the variation seen for different speaking styles must be handled. An investigation of standard automatic speech recognition techniques for different speaking styles showed that lexical modelling using general-purpose variants gave small improvements, but the errors differed compared with using only one canonical pronunciation per word. Modelling the variation using the acoustic models (using context dependency and/or speaker dependent adaptation) gave a significant improvement, but the resulting performance for non-native and spontaneous speech was still far from read speech. In this dissertation a complete data-driven approach to rule-based lexicon adaptation is presented, where the effect of the acoustic models is incorporated in the rule pruning metric. Reference and alternative transcriptions were aligned by dynamic programming, but with a data-driven method to derive the phone-to-phone substitution costs. The costs were based on the statistical co-occurrence of phones, association strength. Rules for pronunciation variation were derived from this alignment. The rules were pruned using a new metric based on acoustic log likelihood. Well trained acoustic models are capable of modelling much of the variation seen, and using the acoustic log likelihood to assess the pronunciation rules prevents the lexical modelling from adding variation already accounted for as shown for direct pronunciation variation modelling. For the non-native task data-driven pronunciation modelling by learning pronunciation rules gave a significant performance gain. Acoustic log likelihood rule pruning performed better than rule probability pruning. For spontaneous dictation the pronunciation variation experiments did not improve the performance. The answer to how to better model the variation for spontaneous speech seems to lie neither in the acoustical nor the lexical modelling. The main differences between read and spontaneous speech are the grammar used and disfluencies like restarts and long pauses. The language model may thus be the best starting point for more research to achieve better performance for this speaking style.</description><subject>automatic speech recognition</subject><subject>lexical modelling</subject><subject>non-native speech</subject><subject>pronunciation variation</subject><subject>TECHNOLOGY: Electrical engineering, electronics and photonics: Electronics</subject><fulltext>true</fulltext><rsrctype>dissertation</rsrctype><creationdate>2002</creationdate><recordtype>dissertation</recordtype><sourceid>3HK</sourceid><recordid>eNqNjDEKwkAQRdNYiHqHuUAgiYrETkSxsLQP4-4kDsTZZXYTBC_vSjyA1f8f3vvz7H0lVGHpwKuTQQxjZCcwok5tDwewGDG3yiMJoE8gmgdEBzr0lN8xkIWezItNEtGij9NH6xRwiO6ZpoHgiZKmZFwn_AWW2azFPtDql4sMzqfb8ZIb5RBZGnGKTVlW26KpNvWuqNd_IB9kb0dQ</recordid><startdate>2002</startdate><enddate>2002</enddate><creator>Amdal, Ingunn</creator><general>Fakultet for informasjonsteknologi, matematikk og elektroteknikk</general><scope>3HK</scope></search><sort><creationdate>2002</creationdate><title>Learning pronunciation variation: A data-driven approach to rule-based lecxicon adaptation for automatic speech recognition</title><author>Amdal, Ingunn</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-cristin_nora_11250_2497093</frbrgroupid><rsrctype>dissertations</rsrctype><prefilter>dissertations</prefilter><language>eng</language><creationdate>2002</creationdate><topic>automatic speech recognition</topic><topic>lexical modelling</topic><topic>non-native speech</topic><topic>pronunciation variation</topic><topic>TECHNOLOGY: Electrical engineering, electronics and photonics: Electronics</topic><toplevel>online_resources</toplevel><creatorcontrib>Amdal, Ingunn</creatorcontrib><collection>NORA - Norwegian Open Research Archives</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Amdal, Ingunn</au><format>dissertation</format><genre>dissertation</genre><ristype>THES</ristype><Advisor>Norges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk</Advisor><Advisor>Svendsen, Torbjørn</Advisor><atitle>Learning pronunciation variation: A data-driven approach to rule-based lecxicon adaptation for automatic speech recognition</atitle><btitle>Dr.ingeniøravhandling, 0809-103X</btitle><date>2002</date><risdate>2002</risdate><abstract>To achieve a robust system the variation seen for different speaking styles must be handled. An investigation of standard automatic speech recognition techniques for different speaking styles showed that lexical modelling using general-purpose variants gave small improvements, but the errors differed compared with using only one canonical pronunciation per word. Modelling the variation using the acoustic models (using context dependency and/or speaker dependent adaptation) gave a significant improvement, but the resulting performance for non-native and spontaneous speech was still far from read speech. In this dissertation a complete data-driven approach to rule-based lexicon adaptation is presented, where the effect of the acoustic models is incorporated in the rule pruning metric. Reference and alternative transcriptions were aligned by dynamic programming, but with a data-driven method to derive the phone-to-phone substitution costs. The costs were based on the statistical co-occurrence of phones, association strength. Rules for pronunciation variation were derived from this alignment. The rules were pruned using a new metric based on acoustic log likelihood. Well trained acoustic models are capable of modelling much of the variation seen, and using the acoustic log likelihood to assess the pronunciation rules prevents the lexical modelling from adding variation already accounted for as shown for direct pronunciation variation modelling. For the non-native task data-driven pronunciation modelling by learning pronunciation rules gave a significant performance gain. Acoustic log likelihood rule pruning performed better than rule probability pruning. For spontaneous dictation the pronunciation variation experiments did not improve the performance. The answer to how to better model the variation for spontaneous speech seems to lie neither in the acoustical nor the lexical modelling. The main differences between read and spontaneous speech are the grammar used and disfluencies like restarts and long pauses. The language model may thus be the best starting point for more research to achieve better performance for this speaking style.</abstract><pub>Fakultet for informasjonsteknologi, matematikk og elektroteknikk</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier
ispartof Dr.ingeniøravhandling, 0809-103X, 2002
issn
language eng
recordid cdi_cristin_nora_11250_249709
source NORA - Norwegian Open Research Archives
subjects automatic speech recognition
lexical modelling
non-native speech
pronunciation variation
TECHNOLOGY: Electrical engineering, electronics and photonics: Electronics
title Learning pronunciation variation: A data-driven approach to rule-based lecxicon adaptation for automatic speech recognition
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-19T20%3A52%3A05IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-cristin_3HK&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&rft.genre=dissertation&rft.atitle=Learning%20pronunciation%20variation:%20A%20data-driven%20approach%20to%20rule-based%20lecxicon%20adaptation%20for%20automatic%20speech%20recognition&rft.btitle=Dr.ingeni%C3%B8ravhandling,%200809-103X&rft.au=Amdal,%20Ingunn&rft.date=2002&rft_id=info:doi/&rft_dat=%3Ccristin_3HK%3E11250_249709%3C/cristin_3HK%3E%3Cgrp_id%3Ecdi_FETCH-cristin_nora_11250_2497093%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true