Loading…

Pronunciation modeling using a finite-state transducer representation

The MIT summit speech recognition system models pronunciation using a phonemic baseform dictionary along with rewrite rules for modeling phonological variation and multi-word reductions. Each pronunciation component is encoded within a finite-state transducer (FST) representation whose transition we...

Full description

Saved in:
Bibliographic Details
Published in:Speech communication 2005-06, Vol.46 (2), p.189-203
Main Authors: Hazen, Timothy J., Hetherington, I. Lee, Shu, Han, Livescu, Karen
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c475t-320e205d7154ad2bf14267ceaf1cb02aa0cdaf3f9d1ff2ce346924f5e98cca533
cites cdi_FETCH-LOGICAL-c475t-320e205d7154ad2bf14267ceaf1cb02aa0cdaf3f9d1ff2ce346924f5e98cca533
container_end_page 203
container_issue 2
container_start_page 189
container_title Speech communication
container_volume 46
creator Hazen, Timothy J.
Hetherington, I. Lee
Shu, Han
Livescu, Karen
description The MIT summit speech recognition system models pronunciation using a phonemic baseform dictionary along with rewrite rules for modeling phonological variation and multi-word reductions. Each pronunciation component is encoded within a finite-state transducer (FST) representation whose transition weights can be trained using an EM algorithm for finite-state networks. This paper explains the modeling approach we use and the details of its realization. We demonstrate the benefits and weaknesses of the approach both conceptually and empirically using the recognizer for our jupiter weather information system. Our experiments demonstrate that the use of phonological rewrite rules within our system achieves word error rate reductions between 4% and 9% over different test sets when compared against a system using no phonological rewrite rules.
doi_str_mv 10.1016/j.specom.2005.03.004
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_85631071</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0167639305000361</els_id><sourcerecordid>85631071</sourcerecordid><originalsourceid>FETCH-LOGICAL-c475t-320e205d7154ad2bf14267ceaf1cb02aa0cdaf3f9d1ff2ce346924f5e98cca533</originalsourceid><addsrcrecordid>eNqNkUtLxDAUhYMoOI7-Axfd6K41jz7SjSDD-IABXeg6ZG5vJEObjkkq-O9treBO3dy7-c653HMIOWc0Y5SVV7ss7BH6LuOUFhkVGaX5AVkwWfG0YpIfksWIVWkpanFMTkLY0ZGQki_I-sn3bnBgdbS9S7q-wda612QI09SJsc5GTEPUEZPotQvNAOgTj3uPAV380p2SI6PbgGffe0lebtfPq_t083j3sLrZpJBXRUwFp8hp0VSsyHXDt4blvKwAtWGwpVxrCo02wtQNM4YDiryseW4KrCWALoRYksvZd-_7twFDVJ0NgG2rHfZDULIoBaMV-wcohCgr-SfI64LLMp_AfAbB9yF4NGrvbaf9h2JUTS2onZpbUFMLigo1ZjzKLr79dQDdmjFBsOFHOz4oZT19dj1zOMb3btGrABYdYGM9QlRNb38_9AkRqKCb</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>29528648</pqid></control><display><type>article</type><title>Pronunciation modeling using a finite-state transducer representation</title><source>Elsevier</source><source>Linguistics and Language Behavior Abstracts (LLBA)</source><creator>Hazen, Timothy J. ; Hetherington, I. Lee ; Shu, Han ; Livescu, Karen</creator><creatorcontrib>Hazen, Timothy J. ; Hetherington, I. Lee ; Shu, Han ; Livescu, Karen</creatorcontrib><description>The MIT summit speech recognition system models pronunciation using a phonemic baseform dictionary along with rewrite rules for modeling phonological variation and multi-word reductions. Each pronunciation component is encoded within a finite-state transducer (FST) representation whose transition weights can be trained using an EM algorithm for finite-state networks. This paper explains the modeling approach we use and the details of its realization. We demonstrate the benefits and weaknesses of the approach both conceptually and empirically using the recognizer for our jupiter weather information system. Our experiments demonstrate that the use of phonological rewrite rules within our system achieves word error rate reductions between 4% and 9% over different test sets when compared against a system using no phonological rewrite rules.</description><identifier>ISSN: 0167-6393</identifier><identifier>EISSN: 1872-7182</identifier><identifier>DOI: 10.1016/j.specom.2005.03.004</identifier><identifier>CODEN: SCOMDH</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Applied sciences ; Exact sciences and technology ; Information, signal and communications theory ; Signal processing ; Speech processing ; Telecommunications and information theory</subject><ispartof>Speech communication, 2005-06, Vol.46 (2), p.189-203</ispartof><rights>2005 Elsevier B.V.</rights><rights>2005 INIST-CNRS</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c475t-320e205d7154ad2bf14267ceaf1cb02aa0cdaf3f9d1ff2ce346924f5e98cca533</citedby><cites>FETCH-LOGICAL-c475t-320e205d7154ad2bf14267ceaf1cb02aa0cdaf3f9d1ff2ce346924f5e98cca533</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>309,310,314,776,780,785,786,23910,23911,25119,27903,27904,31249</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=16928893$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Hazen, Timothy J.</creatorcontrib><creatorcontrib>Hetherington, I. Lee</creatorcontrib><creatorcontrib>Shu, Han</creatorcontrib><creatorcontrib>Livescu, Karen</creatorcontrib><title>Pronunciation modeling using a finite-state transducer representation</title><title>Speech communication</title><description>The MIT summit speech recognition system models pronunciation using a phonemic baseform dictionary along with rewrite rules for modeling phonological variation and multi-word reductions. Each pronunciation component is encoded within a finite-state transducer (FST) representation whose transition weights can be trained using an EM algorithm for finite-state networks. This paper explains the modeling approach we use and the details of its realization. We demonstrate the benefits and weaknesses of the approach both conceptually and empirically using the recognizer for our jupiter weather information system. Our experiments demonstrate that the use of phonological rewrite rules within our system achieves word error rate reductions between 4% and 9% over different test sets when compared against a system using no phonological rewrite rules.</description><subject>Applied sciences</subject><subject>Exact sciences and technology</subject><subject>Information, signal and communications theory</subject><subject>Signal processing</subject><subject>Speech processing</subject><subject>Telecommunications and information theory</subject><issn>0167-6393</issn><issn>1872-7182</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2005</creationdate><recordtype>article</recordtype><sourceid>7T9</sourceid><recordid>eNqNkUtLxDAUhYMoOI7-Axfd6K41jz7SjSDD-IABXeg6ZG5vJEObjkkq-O9treBO3dy7-c653HMIOWc0Y5SVV7ss7BH6LuOUFhkVGaX5AVkwWfG0YpIfksWIVWkpanFMTkLY0ZGQki_I-sn3bnBgdbS9S7q-wda612QI09SJsc5GTEPUEZPotQvNAOgTj3uPAV380p2SI6PbgGffe0lebtfPq_t083j3sLrZpJBXRUwFp8hp0VSsyHXDt4blvKwAtWGwpVxrCo02wtQNM4YDiryseW4KrCWALoRYksvZd-_7twFDVJ0NgG2rHfZDULIoBaMV-wcohCgr-SfI64LLMp_AfAbB9yF4NGrvbaf9h2JUTS2onZpbUFMLigo1ZjzKLr79dQDdmjFBsOFHOz4oZT19dj1zOMb3btGrABYdYGM9QlRNb38_9AkRqKCb</recordid><startdate>20050601</startdate><enddate>20050601</enddate><creator>Hazen, Timothy J.</creator><creator>Hetherington, I. Lee</creator><creator>Shu, Han</creator><creator>Livescu, Karen</creator><general>Elsevier B.V</general><general>Elsevier</general><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>8BM</scope><scope>7T9</scope></search><sort><creationdate>20050601</creationdate><title>Pronunciation modeling using a finite-state transducer representation</title><author>Hazen, Timothy J. ; Hetherington, I. Lee ; Shu, Han ; Livescu, Karen</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c475t-320e205d7154ad2bf14267ceaf1cb02aa0cdaf3f9d1ff2ce346924f5e98cca533</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2005</creationdate><topic>Applied sciences</topic><topic>Exact sciences and technology</topic><topic>Information, signal and communications theory</topic><topic>Signal processing</topic><topic>Speech processing</topic><topic>Telecommunications and information theory</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hazen, Timothy J.</creatorcontrib><creatorcontrib>Hetherington, I. Lee</creatorcontrib><creatorcontrib>Shu, Han</creatorcontrib><creatorcontrib>Livescu, Karen</creatorcontrib><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ComDisDome</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><jtitle>Speech communication</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hazen, Timothy J.</au><au>Hetherington, I. Lee</au><au>Shu, Han</au><au>Livescu, Karen</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Pronunciation modeling using a finite-state transducer representation</atitle><jtitle>Speech communication</jtitle><date>2005-06-01</date><risdate>2005</risdate><volume>46</volume><issue>2</issue><spage>189</spage><epage>203</epage><pages>189-203</pages><issn>0167-6393</issn><eissn>1872-7182</eissn><coden>SCOMDH</coden><abstract>The MIT summit speech recognition system models pronunciation using a phonemic baseform dictionary along with rewrite rules for modeling phonological variation and multi-word reductions. Each pronunciation component is encoded within a finite-state transducer (FST) representation whose transition weights can be trained using an EM algorithm for finite-state networks. This paper explains the modeling approach we use and the details of its realization. We demonstrate the benefits and weaknesses of the approach both conceptually and empirically using the recognizer for our jupiter weather information system. Our experiments demonstrate that the use of phonological rewrite rules within our system achieves word error rate reductions between 4% and 9% over different test sets when compared against a system using no phonological rewrite rules.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.specom.2005.03.004</doi><tpages>15</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0167-6393
ispartof Speech communication, 2005-06, Vol.46 (2), p.189-203
issn 0167-6393
1872-7182
language eng
recordid cdi_proquest_miscellaneous_85631071
source Elsevier; Linguistics and Language Behavior Abstracts (LLBA)
subjects Applied sciences
Exact sciences and technology
Information, signal and communications theory
Signal processing
Speech processing
Telecommunications and information theory
title Pronunciation modeling using a finite-state transducer representation
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-23T13%3A00%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Pronunciation%20modeling%20using%20a%20finite-state%20transducer%20representation&rft.jtitle=Speech%20communication&rft.au=Hazen,%20Timothy%20J.&rft.date=2005-06-01&rft.volume=46&rft.issue=2&rft.spage=189&rft.epage=203&rft.pages=189-203&rft.issn=0167-6393&rft.eissn=1872-7182&rft.coden=SCOMDH&rft_id=info:doi/10.1016/j.specom.2005.03.004&rft_dat=%3Cproquest_cross%3E85631071%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c475t-320e205d7154ad2bf14267ceaf1cb02aa0cdaf3f9d1ff2ce346924f5e98cca533%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=29528648&rft_id=info:pmid/&rfr_iscdi=true