Loading…

EPG2S: Speech Generation and Speech Enhancement Based on Electropalatography and Audio Signals Using Multimodal Learning

Speech generation and enhancement based on articulatory movements facilitate communication when the scope of verbal communication is absent, e.g., in patients who have lost the ability to speak. Although various techniques have been proposed to this end, electropalatography (EPG), which is a monitor...

Full description

Saved in:
Bibliographic Details
Published in:IEEE signal processing letters 2022, Vol.29, p.2582-2586
Main Authors: Chen, Li-Chin, Chen, Po-Hsun, Tsai, Richard Tzong-Han, Tsao, Yu
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c291t-9e40049c530cc535c1038715bb921e9701888bdfb918dc45f892674688af77e23
cites cdi_FETCH-LOGICAL-c291t-9e40049c530cc535c1038715bb921e9701888bdfb918dc45f892674688af77e23
container_end_page 2586
container_issue
container_start_page 2582
container_title IEEE signal processing letters
container_volume 29
creator Chen, Li-Chin
Chen, Po-Hsun
Tsai, Richard Tzong-Han
Tsao, Yu
description Speech generation and enhancement based on articulatory movements facilitate communication when the scope of verbal communication is absent, e.g., in patients who have lost the ability to speak. Although various techniques have been proposed to this end, electropalatography (EPG), which is a monitoring technique that records contact between the tongue and hard palate during speech, has not been adequately explored. Herein, we propose a novel multimodal EPG-to-speech (EPG2S) system that utilizes EPG and speech signals for speech generation and enhancement. Different fusion strategies of combining EPG and noisy speech signals are examined, and the viability of the proposed method is investigated. Experimental results indicate that EPG2S achieves desirable speech generation outcomes based solely on EPG signals. Further, the addition of noisy speech signals is observed to improve quality and intelligibility of the generated speech signals. Additionally, EPG2S is observed to achieve high-quality speech enhancement based solely on audio signals, with the addition of EPG signals further improving the performance. Finally, the late fusion strategy is deemed to be more effective for both speech generation and enhancement.
doi_str_mv 10.1109/LSP.2022.3184636
format article
fullrecord <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_ieee_primary_9801623</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9801623</ieee_id><sourcerecordid>2757178258</sourcerecordid><originalsourceid>FETCH-LOGICAL-c291t-9e40049c530cc535c1038715bb921e9701888bdfb918dc45f892674688af77e23</originalsourceid><addsrcrecordid>eNo9kEFLw0AQhYMoWKt3wcuC59TZTTa7662WWIWIhdhz2GwmbUq6iZsE7L83teplZnjzvWF4nndLYUYpqIckXc0YMDYLqAyjIDrzJpRz6bMgoufjDAJ8pUBeelddtwMASSWfeF_xasnSR5K2iGZLlmjR6b5qLNG2-FNju9XW4B5tT550hwUZ93GNpndNq2vdNxun2-3hxzMfiqohabWxuu7IuqvshrwNdV_tm0LXJEHt7KhdexflCODNb5966-f4Y_HiJ-_L18U88Q1TtPcVhgChMjwAMxZuKARSUJ7nilFUAqiUMi_KXFFZmJCXUrFIhJGUuhQCWTD17k93W9d8Dtj12a4Z3PG3jAkuqJCMy5GCE2Vc03UOy6x11V67Q0YhO-abjflmx3yz33xHy93JUiHiP64k0IgFwTcNdXXC</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2757178258</pqid></control><display><type>article</type><title>EPG2S: Speech Generation and Speech Enhancement Based on Electropalatography and Audio Signals Using Multimodal Learning</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Chen, Li-Chin ; Chen, Po-Hsun ; Tsai, Richard Tzong-Han ; Tsao, Yu</creator><creatorcontrib>Chen, Li-Chin ; Chen, Po-Hsun ; Tsai, Richard Tzong-Han ; Tsao, Yu</creatorcontrib><description>Speech generation and enhancement based on articulatory movements facilitate communication when the scope of verbal communication is absent, e.g., in patients who have lost the ability to speak. Although various techniques have been proposed to this end, electropalatography (EPG), which is a monitoring technique that records contact between the tongue and hard palate during speech, has not been adequately explored. Herein, we propose a novel multimodal EPG-to-speech (EPG2S) system that utilizes EPG and speech signals for speech generation and enhancement. Different fusion strategies of combining EPG and noisy speech signals are examined, and the viability of the proposed method is investigated. Experimental results indicate that EPG2S achieves desirable speech generation outcomes based solely on EPG signals. Further, the addition of noisy speech signals is observed to improve quality and intelligibility of the generated speech signals. Additionally, EPG2S is observed to achieve high-quality speech enhancement based solely on audio signals, with the addition of EPG signals further improving the performance. Finally, the late fusion strategy is deemed to be more effective for both speech generation and enhancement.</description><identifier>ISSN: 1070-9908</identifier><identifier>EISSN: 1558-2361</identifier><identifier>DOI: 10.1109/LSP.2022.3184636</identifier><identifier>CODEN: ISPLEM</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Audio signals ; Decoding ; electropalatography ; Feature extraction ; Intelligibility ; Loss measurement ; model fusion ; Noise measurement ; Signal generation ; Spectrogram ; Speech ; Speech enhancement ; speech generation ; Speech processing ; Speech recognition ; speech signal ; Speech synthesis ; Tongue ; Verbal communication</subject><ispartof>IEEE signal processing letters, 2022, Vol.29, p.2582-2586</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c291t-9e40049c530cc535c1038715bb921e9701888bdfb918dc45f892674688af77e23</citedby><cites>FETCH-LOGICAL-c291t-9e40049c530cc535c1038715bb921e9701888bdfb918dc45f892674688af77e23</cites><orcidid>0000-0003-0513-107X ; 0000-0001-6956-0418 ; 0000-0002-2122-1625</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9801623$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,4024,27923,27924,27925,54796</link.rule.ids></links><search><creatorcontrib>Chen, Li-Chin</creatorcontrib><creatorcontrib>Chen, Po-Hsun</creatorcontrib><creatorcontrib>Tsai, Richard Tzong-Han</creatorcontrib><creatorcontrib>Tsao, Yu</creatorcontrib><title>EPG2S: Speech Generation and Speech Enhancement Based on Electropalatography and Audio Signals Using Multimodal Learning</title><title>IEEE signal processing letters</title><addtitle>LSP</addtitle><description>Speech generation and enhancement based on articulatory movements facilitate communication when the scope of verbal communication is absent, e.g., in patients who have lost the ability to speak. Although various techniques have been proposed to this end, electropalatography (EPG), which is a monitoring technique that records contact between the tongue and hard palate during speech, has not been adequately explored. Herein, we propose a novel multimodal EPG-to-speech (EPG2S) system that utilizes EPG and speech signals for speech generation and enhancement. Different fusion strategies of combining EPG and noisy speech signals are examined, and the viability of the proposed method is investigated. Experimental results indicate that EPG2S achieves desirable speech generation outcomes based solely on EPG signals. Further, the addition of noisy speech signals is observed to improve quality and intelligibility of the generated speech signals. Additionally, EPG2S is observed to achieve high-quality speech enhancement based solely on audio signals, with the addition of EPG signals further improving the performance. Finally, the late fusion strategy is deemed to be more effective for both speech generation and enhancement.</description><subject>Audio signals</subject><subject>Decoding</subject><subject>electropalatography</subject><subject>Feature extraction</subject><subject>Intelligibility</subject><subject>Loss measurement</subject><subject>model fusion</subject><subject>Noise measurement</subject><subject>Signal generation</subject><subject>Spectrogram</subject><subject>Speech</subject><subject>Speech enhancement</subject><subject>speech generation</subject><subject>Speech processing</subject><subject>Speech recognition</subject><subject>speech signal</subject><subject>Speech synthesis</subject><subject>Tongue</subject><subject>Verbal communication</subject><issn>1070-9908</issn><issn>1558-2361</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNo9kEFLw0AQhYMoWKt3wcuC59TZTTa7662WWIWIhdhz2GwmbUq6iZsE7L83teplZnjzvWF4nndLYUYpqIckXc0YMDYLqAyjIDrzJpRz6bMgoufjDAJ8pUBeelddtwMASSWfeF_xasnSR5K2iGZLlmjR6b5qLNG2-FNju9XW4B5tT550hwUZ93GNpndNq2vdNxun2-3hxzMfiqohabWxuu7IuqvshrwNdV_tm0LXJEHt7KhdexflCODNb5966-f4Y_HiJ-_L18U88Q1TtPcVhgChMjwAMxZuKARSUJ7nilFUAqiUMi_KXFFZmJCXUrFIhJGUuhQCWTD17k93W9d8Dtj12a4Z3PG3jAkuqJCMy5GCE2Vc03UOy6x11V67Q0YhO-abjflmx3yz33xHy93JUiHiP64k0IgFwTcNdXXC</recordid><startdate>2022</startdate><enddate>2022</enddate><creator>Chen, Li-Chin</creator><creator>Chen, Po-Hsun</creator><creator>Tsai, Richard Tzong-Han</creator><creator>Tsao, Yu</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-0513-107X</orcidid><orcidid>https://orcid.org/0000-0001-6956-0418</orcidid><orcidid>https://orcid.org/0000-0002-2122-1625</orcidid></search><sort><creationdate>2022</creationdate><title>EPG2S: Speech Generation and Speech Enhancement Based on Electropalatography and Audio Signals Using Multimodal Learning</title><author>Chen, Li-Chin ; Chen, Po-Hsun ; Tsai, Richard Tzong-Han ; Tsao, Yu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c291t-9e40049c530cc535c1038715bb921e9701888bdfb918dc45f892674688af77e23</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Audio signals</topic><topic>Decoding</topic><topic>electropalatography</topic><topic>Feature extraction</topic><topic>Intelligibility</topic><topic>Loss measurement</topic><topic>model fusion</topic><topic>Noise measurement</topic><topic>Signal generation</topic><topic>Spectrogram</topic><topic>Speech</topic><topic>Speech enhancement</topic><topic>speech generation</topic><topic>Speech processing</topic><topic>Speech recognition</topic><topic>speech signal</topic><topic>Speech synthesis</topic><topic>Tongue</topic><topic>Verbal communication</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chen, Li-Chin</creatorcontrib><creatorcontrib>Chen, Po-Hsun</creatorcontrib><creatorcontrib>Tsai, Richard Tzong-Han</creatorcontrib><creatorcontrib>Tsao, Yu</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEL</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE signal processing letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chen, Li-Chin</au><au>Chen, Po-Hsun</au><au>Tsai, Richard Tzong-Han</au><au>Tsao, Yu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>EPG2S: Speech Generation and Speech Enhancement Based on Electropalatography and Audio Signals Using Multimodal Learning</atitle><jtitle>IEEE signal processing letters</jtitle><stitle>LSP</stitle><date>2022</date><risdate>2022</risdate><volume>29</volume><spage>2582</spage><epage>2586</epage><pages>2582-2586</pages><issn>1070-9908</issn><eissn>1558-2361</eissn><coden>ISPLEM</coden><abstract>Speech generation and enhancement based on articulatory movements facilitate communication when the scope of verbal communication is absent, e.g., in patients who have lost the ability to speak. Although various techniques have been proposed to this end, electropalatography (EPG), which is a monitoring technique that records contact between the tongue and hard palate during speech, has not been adequately explored. Herein, we propose a novel multimodal EPG-to-speech (EPG2S) system that utilizes EPG and speech signals for speech generation and enhancement. Different fusion strategies of combining EPG and noisy speech signals are examined, and the viability of the proposed method is investigated. Experimental results indicate that EPG2S achieves desirable speech generation outcomes based solely on EPG signals. Further, the addition of noisy speech signals is observed to improve quality and intelligibility of the generated speech signals. Additionally, EPG2S is observed to achieve high-quality speech enhancement based solely on audio signals, with the addition of EPG signals further improving the performance. Finally, the late fusion strategy is deemed to be more effective for both speech generation and enhancement.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/LSP.2022.3184636</doi><tpages>5</tpages><orcidid>https://orcid.org/0000-0003-0513-107X</orcidid><orcidid>https://orcid.org/0000-0001-6956-0418</orcidid><orcidid>https://orcid.org/0000-0002-2122-1625</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1070-9908
ispartof IEEE signal processing letters, 2022, Vol.29, p.2582-2586
issn 1070-9908
1558-2361
language eng
recordid cdi_ieee_primary_9801623
source IEEE Electronic Library (IEL) Journals
subjects Audio signals
Decoding
electropalatography
Feature extraction
Intelligibility
Loss measurement
model fusion
Noise measurement
Signal generation
Spectrogram
Speech
Speech enhancement
speech generation
Speech processing
Speech recognition
speech signal
Speech synthesis
Tongue
Verbal communication
title EPG2S: Speech Generation and Speech Enhancement Based on Electropalatography and Audio Signals Using Multimodal Learning
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T10%3A20%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=EPG2S:%20Speech%20Generation%20and%20Speech%20Enhancement%20Based%20on%20Electropalatography%20and%20Audio%20Signals%20Using%20Multimodal%20Learning&rft.jtitle=IEEE%20signal%20processing%20letters&rft.au=Chen,%20Li-Chin&rft.date=2022&rft.volume=29&rft.spage=2582&rft.epage=2586&rft.pages=2582-2586&rft.issn=1070-9908&rft.eissn=1558-2361&rft.coden=ISPLEM&rft_id=info:doi/10.1109/LSP.2022.3184636&rft_dat=%3Cproquest_ieee_%3E2757178258%3C/proquest_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c291t-9e40049c530cc535c1038715bb921e9701888bdfb918dc45f892674688af77e23%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2757178258&rft_id=info:pmid/&rft_ieee_id=9801623&rfr_iscdi=true