Loading…

Inverting mappings from smooth paths through R(n) to paths through R(m): A technique applied to recovering articulation from acoustics

Motor theories, which postulate that speech perception is related to linguistically significant movements of the vocal tract, have guided speech perception research for nearly four decades but have had little impact on automatic speech recognition. In this paper, we describe a signal processing tech...

Full description

Saved in:

Bibliographic Details
Published in:	Speech communication 2007-05, Vol.49 (5), p.361-383
Main Authors:	Hogden, John, Rubin, Philip, McDermott, Erik, Katagiri, Shigeru, Goldstein, Louis
Format:	Article
Language:	English
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page	383
container_issue	5
container_start_page	361
container_title	Speech communication
container_volume	49
creator	Hogden, John Rubin, Philip McDermott, Erik Katagiri, Shigeru Goldstein, Louis
description	Motor theories, which postulate that speech perception is related to linguistically significant movements of the vocal tract, have guided speech perception research for nearly four decades but have had little impact on automatic speech recognition. In this paper, we describe a signal processing technique named MIMICRI that may help link motor theory with automatic speech recognition by providing a practical approach to recovering articulator positions from acoustics. MIMICRI's name reflects three important operations it can perform on time-series data: it can reduce the dimensionality of a data set (manifold inference); it can blindly invert nonlinear functions applied to the data (mapping inversion); and it can use temporal context to estimate intermediate data (contextual recovery of information). In order for MIMICRI to work, the signals to be analyzed must be functions of unobservable signals that lie on a linear subspace of the set of all unobservable signals. For example, MIMICRI will typically work if the unobservable signals are band-pass and we know the pass-band, as is the case for articulator motions. We discuss the abilities of MIMICRI as they relate to speech processing applications, particularly as they relate to inverting the mapping from speech articulator positions to acoustics. We then present a mathematical proof that explains why MIMICRI can invert nonlinear functions, which it can do even in some cases in which the mapping from the unobservable variables to the observable variables is many-to-one. Finally, we show that MIMICRI is able to infer accurately the positions of the speech articulators from speech acoustics for vowels. Five parameters estimated by MIMICRI were more linearly related to articulator positions than 128 spectral energies. [Copyright 2007 Elsevier B.V.]
doi_str_mv	10.1016/j.specom.2007.02.008
format	article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_miscellaneous_85684533</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>85345599</sourcerecordid><originalsourceid>FETCH-LOGICAL-p132t-4ea8ef36ef48c96ee8bc330f2638acccbaf00f563602ab550a790f4a2f0224e43</originalsourceid><addsrcrecordid>eNqNjr9OwzAYxD2ARCm8AYMnRIeEL_5Xh62qKFSqhIRgrlzzuUmVxCF2eASeG1dlY2E66e50vyPkpoC8gELdH_LQo_VtzgDmObAcQJ-RSYrmmeIlvyCXIRwAQGjNJuR73X3hEOtuT1vT90kDdYNvaWi9jxXtTawCjdXgx31FX--6GY3-j9vOHuiCRrRVV3-OSNNSU-PHsTqkM4lwBJjEsWNjYu27E8RYP4Zkhity7kwT8PpXp-R99fi2fM42L0_r5WKT9QVnMRNoNDqu0AltS4Wod5ZzcExxbay1O-MAnFRcATM7KcHMS3DCMAeMCRR8Sm5Pu_3g09EQt20dLDaN6TBd2WqptJCc_6PIhZRlyX8AAEh05g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>85345599</pqid></control><display><type>article</type><title>Inverting mappings from smooth paths through R(n) to paths through R(m): A technique applied to recovering articulation from acoustics</title><source>ScienceDirect Freedom Collection 2022-2024</source><source>Linguistics and Language Behavior Abstracts (LLBA)</source><creator>Hogden, John ; Rubin, Philip ; McDermott, Erik ; Katagiri, Shigeru ; Goldstein, Louis</creator><creatorcontrib>Hogden, John ; Rubin, Philip ; McDermott, Erik ; Katagiri, Shigeru ; Goldstein, Louis</creatorcontrib><description>Motor theories, which postulate that speech perception is related to linguistically significant movements of the vocal tract, have guided speech perception research for nearly four decades but have had little impact on automatic speech recognition. In this paper, we describe a signal processing technique named MIMICRI that may help link motor theory with automatic speech recognition by providing a practical approach to recovering articulator positions from acoustics. MIMICRI's name reflects three important operations it can perform on time-series data: it can reduce the dimensionality of a data set (manifold inference); it can blindly invert nonlinear functions applied to the data (mapping inversion); and it can use temporal context to estimate intermediate data (contextual recovery of information). In order for MIMICRI to work, the signals to be analyzed must be functions of unobservable signals that lie on a linear subspace of the set of all unobservable signals. For example, MIMICRI will typically work if the unobservable signals are band-pass and we know the pass-band, as is the case for articulator motions. We discuss the abilities of MIMICRI as they relate to speech processing applications, particularly as they relate to inverting the mapping from speech articulator positions to acoustics. We then present a mathematical proof that explains why MIMICRI can invert nonlinear functions, which it can do even in some cases in which the mapping from the unobservable variables to the observable variables is many-to-one. Finally, we show that MIMICRI is able to infer accurately the positions of the speech articulators from speech acoustics for vowels. Five parameters estimated by MIMICRI were more linearly related to articulator positions than 128 spectral energies. [Copyright 2007 Elsevier B.V.]</description><identifier>ISSN: 0167-6393</identifier><identifier>DOI: 10.1016/j.specom.2007.02.008</identifier><identifier>CODEN: SCOMDH</identifier><language>eng</language><ispartof>Speech communication, 2007-05, Vol.49 (5), p.361-383</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925,31270</link.rule.ids></links><search><creatorcontrib>Hogden, John</creatorcontrib><creatorcontrib>Rubin, Philip</creatorcontrib><creatorcontrib>McDermott, Erik</creatorcontrib><creatorcontrib>Katagiri, Shigeru</creatorcontrib><creatorcontrib>Goldstein, Louis</creatorcontrib><title>Inverting mappings from smooth paths through R(n) to paths through R(m): A technique applied to recovering articulation from acoustics</title><title>Speech communication</title><description>Motor theories, which postulate that speech perception is related to linguistically significant movements of the vocal tract, have guided speech perception research for nearly four decades but have had little impact on automatic speech recognition. In this paper, we describe a signal processing technique named MIMICRI that may help link motor theory with automatic speech recognition by providing a practical approach to recovering articulator positions from acoustics. MIMICRI's name reflects three important operations it can perform on time-series data: it can reduce the dimensionality of a data set (manifold inference); it can blindly invert nonlinear functions applied to the data (mapping inversion); and it can use temporal context to estimate intermediate data (contextual recovery of information). In order for MIMICRI to work, the signals to be analyzed must be functions of unobservable signals that lie on a linear subspace of the set of all unobservable signals. For example, MIMICRI will typically work if the unobservable signals are band-pass and we know the pass-band, as is the case for articulator motions. We discuss the abilities of MIMICRI as they relate to speech processing applications, particularly as they relate to inverting the mapping from speech articulator positions to acoustics. We then present a mathematical proof that explains why MIMICRI can invert nonlinear functions, which it can do even in some cases in which the mapping from the unobservable variables to the observable variables is many-to-one. Finally, we show that MIMICRI is able to infer accurately the positions of the speech articulators from speech acoustics for vowels. Five parameters estimated by MIMICRI were more linearly related to articulator positions than 128 spectral energies. [Copyright 2007 Elsevier B.V.]</description><issn>0167-6393</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2007</creationdate><recordtype>article</recordtype><sourceid>7T9</sourceid><recordid>eNqNjr9OwzAYxD2ARCm8AYMnRIeEL_5Xh62qKFSqhIRgrlzzuUmVxCF2eASeG1dlY2E66e50vyPkpoC8gELdH_LQo_VtzgDmObAcQJ-RSYrmmeIlvyCXIRwAQGjNJuR73X3hEOtuT1vT90kDdYNvaWi9jxXtTawCjdXgx31FX--6GY3-j9vOHuiCRrRVV3-OSNNSU-PHsTqkM4lwBJjEsWNjYu27E8RYP4Zkhity7kwT8PpXp-R99fi2fM42L0_r5WKT9QVnMRNoNDqu0AltS4Wod5ZzcExxbay1O-MAnFRcATM7KcHMS3DCMAeMCRR8Sm5Pu_3g09EQt20dLDaN6TBd2WqptJCc_6PIhZRlyX8AAEh05g</recordid><startdate>20070501</startdate><enddate>20070501</enddate><creator>Hogden, John</creator><creator>Rubin, Philip</creator><creator>McDermott, Erik</creator><creator>Katagiri, Shigeru</creator><creator>Goldstein, Louis</creator><scope>8BM</scope><scope>7T9</scope></search><sort><creationdate>20070501</creationdate><title>Inverting mappings from smooth paths through R(n) to paths through R(m): A technique applied to recovering articulation from acoustics</title><author>Hogden, John ; Rubin, Philip ; McDermott, Erik ; Katagiri, Shigeru ; Goldstein, Louis</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-p132t-4ea8ef36ef48c96ee8bc330f2638acccbaf00f563602ab550a790f4a2f0224e43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2007</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Hogden, John</creatorcontrib><creatorcontrib>Rubin, Philip</creatorcontrib><creatorcontrib>McDermott, Erik</creatorcontrib><creatorcontrib>Katagiri, Shigeru</creatorcontrib><creatorcontrib>Goldstein, Louis</creatorcontrib><collection>ComDisDome</collection><collection>Linguistics and Language Behavior Abstracts (LLBA)</collection><jtitle>Speech communication</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Hogden, John</au><au>Rubin, Philip</au><au>McDermott, Erik</au><au>Katagiri, Shigeru</au><au>Goldstein, Louis</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Inverting mappings from smooth paths through R(n) to paths through R(m): A technique applied to recovering articulation from acoustics</atitle><jtitle>Speech communication</jtitle><date>2007-05-01</date><risdate>2007</risdate><volume>49</volume><issue>5</issue><spage>361</spage><epage>383</epage><pages>361-383</pages><issn>0167-6393</issn><coden>SCOMDH</coden><abstract>Motor theories, which postulate that speech perception is related to linguistically significant movements of the vocal tract, have guided speech perception research for nearly four decades but have had little impact on automatic speech recognition. In this paper, we describe a signal processing technique named MIMICRI that may help link motor theory with automatic speech recognition by providing a practical approach to recovering articulator positions from acoustics. MIMICRI's name reflects three important operations it can perform on time-series data: it can reduce the dimensionality of a data set (manifold inference); it can blindly invert nonlinear functions applied to the data (mapping inversion); and it can use temporal context to estimate intermediate data (contextual recovery of information). In order for MIMICRI to work, the signals to be analyzed must be functions of unobservable signals that lie on a linear subspace of the set of all unobservable signals. For example, MIMICRI will typically work if the unobservable signals are band-pass and we know the pass-band, as is the case for articulator motions. We discuss the abilities of MIMICRI as they relate to speech processing applications, particularly as they relate to inverting the mapping from speech articulator positions to acoustics. We then present a mathematical proof that explains why MIMICRI can invert nonlinear functions, which it can do even in some cases in which the mapping from the unobservable variables to the observable variables is many-to-one. Finally, we show that MIMICRI is able to infer accurately the positions of the speech articulators from speech acoustics for vowels. Five parameters estimated by MIMICRI were more linearly related to articulator positions than 128 spectral energies. [Copyright 2007 Elsevier B.V.]</abstract><doi>10.1016/j.specom.2007.02.008</doi><tpages>23</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0167-6393
ispartof	Speech communication, 2007-05, Vol.49 (5), p.361-383
issn	0167-6393
language	eng
recordid	cdi_proquest_miscellaneous_85684533
source	ScienceDirect Freedom Collection 2022-2024; Linguistics and Language Behavior Abstracts (LLBA)
title	Inverting mappings from smooth paths through R(n) to paths through R(m): A technique applied to recovering articulation from acoustics
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T15%3A22%3A58IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Inverting%20mappings%20from%20smooth%20paths%20through%20R(n)%20to%20paths%20through%20R(m):%20A%20technique%20applied%20to%20recovering%20articulation%20from%20acoustics&rft.jtitle=Speech%20communication&rft.au=Hogden,%20John&rft.date=2007-05-01&rft.volume=49&rft.issue=5&rft.spage=361&rft.epage=383&rft.pages=361-383&rft.issn=0167-6393&rft.coden=SCOMDH&rft_id=info:doi/10.1016/j.specom.2007.02.008&rft_dat=%3Cproquest%3E85345599%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-p132t-4ea8ef36ef48c96ee8bc330f2638acccbaf00f563602ab550a790f4a2f0224e43%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=85345599&rft_id=info:pmid/&rfr_iscdi=true