Loading…
EPG2S: Speech Generation and Speech Enhancement Based on Electropalatography and Audio Signals Using Multimodal Learning
Speech generation and enhancement based on articulatory movements facilitate communication when the scope of verbal communication is absent, e.g., in patients who have lost the ability to speak. Although various techniques have been proposed to this end, electropalatography (EPG), which is a monitor...
Saved in:
Published in: | IEEE signal processing letters 2022, Vol.29, p.2582-2586 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c291t-9e40049c530cc535c1038715bb921e9701888bdfb918dc45f892674688af77e23 |
---|---|
cites | cdi_FETCH-LOGICAL-c291t-9e40049c530cc535c1038715bb921e9701888bdfb918dc45f892674688af77e23 |
container_end_page | 2586 |
container_issue | |
container_start_page | 2582 |
container_title | IEEE signal processing letters |
container_volume | 29 |
creator | Chen, Li-Chin Chen, Po-Hsun Tsai, Richard Tzong-Han Tsao, Yu |
description | Speech generation and enhancement based on articulatory movements facilitate communication when the scope of verbal communication is absent, e.g., in patients who have lost the ability to speak. Although various techniques have been proposed to this end, electropalatography (EPG), which is a monitoring technique that records contact between the tongue and hard palate during speech, has not been adequately explored. Herein, we propose a novel multimodal EPG-to-speech (EPG2S) system that utilizes EPG and speech signals for speech generation and enhancement. Different fusion strategies of combining EPG and noisy speech signals are examined, and the viability of the proposed method is investigated. Experimental results indicate that EPG2S achieves desirable speech generation outcomes based solely on EPG signals. Further, the addition of noisy speech signals is observed to improve quality and intelligibility of the generated speech signals. Additionally, EPG2S is observed to achieve high-quality speech enhancement based solely on audio signals, with the addition of EPG signals further improving the performance. Finally, the late fusion strategy is deemed to be more effective for both speech generation and enhancement. |
doi_str_mv | 10.1109/LSP.2022.3184636 |
format | article |
fullrecord | <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_ieee_primary_9801623</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9801623</ieee_id><sourcerecordid>2757178258</sourcerecordid><originalsourceid>FETCH-LOGICAL-c291t-9e40049c530cc535c1038715bb921e9701888bdfb918dc45f892674688af77e23</originalsourceid><addsrcrecordid>eNo9kEFLw0AQhYMoWKt3wcuC59TZTTa7662WWIWIhdhz2GwmbUq6iZsE7L83teplZnjzvWF4nndLYUYpqIckXc0YMDYLqAyjIDrzJpRz6bMgoufjDAJ8pUBeelddtwMASSWfeF_xasnSR5K2iGZLlmjR6b5qLNG2-FNju9XW4B5tT550hwUZ93GNpndNq2vdNxun2-3hxzMfiqohabWxuu7IuqvshrwNdV_tm0LXJEHt7KhdexflCODNb5966-f4Y_HiJ-_L18U88Q1TtPcVhgChMjwAMxZuKARSUJ7nilFUAqiUMi_KXFFZmJCXUrFIhJGUuhQCWTD17k93W9d8Dtj12a4Z3PG3jAkuqJCMy5GCE2Vc03UOy6x11V67Q0YhO-abjflmx3yz33xHy93JUiHiP64k0IgFwTcNdXXC</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2757178258</pqid></control><display><type>article</type><title>EPG2S: Speech Generation and Speech Enhancement Based on Electropalatography and Audio Signals Using Multimodal Learning</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Chen, Li-Chin ; Chen, Po-Hsun ; Tsai, Richard Tzong-Han ; Tsao, Yu</creator><creatorcontrib>Chen, Li-Chin ; Chen, Po-Hsun ; Tsai, Richard Tzong-Han ; Tsao, Yu</creatorcontrib><description>Speech generation and enhancement based on articulatory movements facilitate communication when the scope of verbal communication is absent, e.g., in patients who have lost the ability to speak. Although various techniques have been proposed to this end, electropalatography (EPG), which is a monitoring technique that records contact between the tongue and hard palate during speech, has not been adequately explored. Herein, we propose a novel multimodal EPG-to-speech (EPG2S) system that utilizes EPG and speech signals for speech generation and enhancement. Different fusion strategies of combining EPG and noisy speech signals are examined, and the viability of the proposed method is investigated. Experimental results indicate that EPG2S achieves desirable speech generation outcomes based solely on EPG signals. Further, the addition of noisy speech signals is observed to improve quality and intelligibility of the generated speech signals. Additionally, EPG2S is observed to achieve high-quality speech enhancement based solely on audio signals, with the addition of EPG signals further improving the performance. Finally, the late fusion strategy is deemed to be more effective for both speech generation and enhancement.</description><identifier>ISSN: 1070-9908</identifier><identifier>EISSN: 1558-2361</identifier><identifier>DOI: 10.1109/LSP.2022.3184636</identifier><identifier>CODEN: ISPLEM</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Audio signals ; Decoding ; electropalatography ; Feature extraction ; Intelligibility ; Loss measurement ; model fusion ; Noise measurement ; Signal generation ; Spectrogram ; Speech ; Speech enhancement ; speech generation ; Speech processing ; Speech recognition ; speech signal ; Speech synthesis ; Tongue ; Verbal communication</subject><ispartof>IEEE signal processing letters, 2022, Vol.29, p.2582-2586</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c291t-9e40049c530cc535c1038715bb921e9701888bdfb918dc45f892674688af77e23</citedby><cites>FETCH-LOGICAL-c291t-9e40049c530cc535c1038715bb921e9701888bdfb918dc45f892674688af77e23</cites><orcidid>0000-0003-0513-107X ; 0000-0001-6956-0418 ; 0000-0002-2122-1625</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9801623$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,4024,27923,27924,27925,54796</link.rule.ids></links><search><creatorcontrib>Chen, Li-Chin</creatorcontrib><creatorcontrib>Chen, Po-Hsun</creatorcontrib><creatorcontrib>Tsai, Richard Tzong-Han</creatorcontrib><creatorcontrib>Tsao, Yu</creatorcontrib><title>EPG2S: Speech Generation and Speech Enhancement Based on Electropalatography and Audio Signals Using Multimodal Learning</title><title>IEEE signal processing letters</title><addtitle>LSP</addtitle><description>Speech generation and enhancement based on articulatory movements facilitate communication when the scope of verbal communication is absent, e.g., in patients who have lost the ability to speak. Although various techniques have been proposed to this end, electropalatography (EPG), which is a monitoring technique that records contact between the tongue and hard palate during speech, has not been adequately explored. Herein, we propose a novel multimodal EPG-to-speech (EPG2S) system that utilizes EPG and speech signals for speech generation and enhancement. Different fusion strategies of combining EPG and noisy speech signals are examined, and the viability of the proposed method is investigated. Experimental results indicate that EPG2S achieves desirable speech generation outcomes based solely on EPG signals. Further, the addition of noisy speech signals is observed to improve quality and intelligibility of the generated speech signals. Additionally, EPG2S is observed to achieve high-quality speech enhancement based solely on audio signals, with the addition of EPG signals further improving the performance. Finally, the late fusion strategy is deemed to be more effective for both speech generation and enhancement.</description><subject>Audio signals</subject><subject>Decoding</subject><subject>electropalatography</subject><subject>Feature extraction</subject><subject>Intelligibility</subject><subject>Loss measurement</subject><subject>model fusion</subject><subject>Noise measurement</subject><subject>Signal generation</subject><subject>Spectrogram</subject><subject>Speech</subject><subject>Speech enhancement</subject><subject>speech generation</subject><subject>Speech processing</subject><subject>Speech recognition</subject><subject>speech signal</subject><subject>Speech synthesis</subject><subject>Tongue</subject><subject>Verbal communication</subject><issn>1070-9908</issn><issn>1558-2361</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNo9kEFLw0AQhYMoWKt3wcuC59TZTTa7662WWIWIhdhz2GwmbUq6iZsE7L83teplZnjzvWF4nndLYUYpqIckXc0YMDYLqAyjIDrzJpRz6bMgoufjDAJ8pUBeelddtwMASSWfeF_xasnSR5K2iGZLlmjR6b5qLNG2-FNju9XW4B5tT550hwUZ93GNpndNq2vdNxun2-3hxzMfiqohabWxuu7IuqvshrwNdV_tm0LXJEHt7KhdexflCODNb5966-f4Y_HiJ-_L18U88Q1TtPcVhgChMjwAMxZuKARSUJ7nilFUAqiUMi_KXFFZmJCXUrFIhJGUuhQCWTD17k93W9d8Dtj12a4Z3PG3jAkuqJCMy5GCE2Vc03UOy6x11V67Q0YhO-abjflmx3yz33xHy93JUiHiP64k0IgFwTcNdXXC</recordid><startdate>2022</startdate><enddate>2022</enddate><creator>Chen, Li-Chin</creator><creator>Chen, Po-Hsun</creator><creator>Tsai, Richard Tzong-Han</creator><creator>Tsao, Yu</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-0513-107X</orcidid><orcidid>https://orcid.org/0000-0001-6956-0418</orcidid><orcidid>https://orcid.org/0000-0002-2122-1625</orcidid></search><sort><creationdate>2022</creationdate><title>EPG2S: Speech Generation and Speech Enhancement Based on Electropalatography and Audio Signals Using Multimodal Learning</title><author>Chen, Li-Chin ; Chen, Po-Hsun ; Tsai, Richard Tzong-Han ; Tsao, Yu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c291t-9e40049c530cc535c1038715bb921e9701888bdfb918dc45f892674688af77e23</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Audio signals</topic><topic>Decoding</topic><topic>electropalatography</topic><topic>Feature extraction</topic><topic>Intelligibility</topic><topic>Loss measurement</topic><topic>model fusion</topic><topic>Noise measurement</topic><topic>Signal generation</topic><topic>Spectrogram</topic><topic>Speech</topic><topic>Speech enhancement</topic><topic>speech generation</topic><topic>Speech processing</topic><topic>Speech recognition</topic><topic>speech signal</topic><topic>Speech synthesis</topic><topic>Tongue</topic><topic>Verbal communication</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chen, Li-Chin</creatorcontrib><creatorcontrib>Chen, Po-Hsun</creatorcontrib><creatorcontrib>Tsai, Richard Tzong-Han</creatorcontrib><creatorcontrib>Tsao, Yu</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEL</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE signal processing letters</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chen, Li-Chin</au><au>Chen, Po-Hsun</au><au>Tsai, Richard Tzong-Han</au><au>Tsao, Yu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>EPG2S: Speech Generation and Speech Enhancement Based on Electropalatography and Audio Signals Using Multimodal Learning</atitle><jtitle>IEEE signal processing letters</jtitle><stitle>LSP</stitle><date>2022</date><risdate>2022</risdate><volume>29</volume><spage>2582</spage><epage>2586</epage><pages>2582-2586</pages><issn>1070-9908</issn><eissn>1558-2361</eissn><coden>ISPLEM</coden><abstract>Speech generation and enhancement based on articulatory movements facilitate communication when the scope of verbal communication is absent, e.g., in patients who have lost the ability to speak. Although various techniques have been proposed to this end, electropalatography (EPG), which is a monitoring technique that records contact between the tongue and hard palate during speech, has not been adequately explored. Herein, we propose a novel multimodal EPG-to-speech (EPG2S) system that utilizes EPG and speech signals for speech generation and enhancement. Different fusion strategies of combining EPG and noisy speech signals are examined, and the viability of the proposed method is investigated. Experimental results indicate that EPG2S achieves desirable speech generation outcomes based solely on EPG signals. Further, the addition of noisy speech signals is observed to improve quality and intelligibility of the generated speech signals. Additionally, EPG2S is observed to achieve high-quality speech enhancement based solely on audio signals, with the addition of EPG signals further improving the performance. Finally, the late fusion strategy is deemed to be more effective for both speech generation and enhancement.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/LSP.2022.3184636</doi><tpages>5</tpages><orcidid>https://orcid.org/0000-0003-0513-107X</orcidid><orcidid>https://orcid.org/0000-0001-6956-0418</orcidid><orcidid>https://orcid.org/0000-0002-2122-1625</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1070-9908 |
ispartof | IEEE signal processing letters, 2022, Vol.29, p.2582-2586 |
issn | 1070-9908 1558-2361 |
language | eng |
recordid | cdi_ieee_primary_9801623 |
source | IEEE Electronic Library (IEL) Journals |
subjects | Audio signals Decoding electropalatography Feature extraction Intelligibility Loss measurement model fusion Noise measurement Signal generation Spectrogram Speech Speech enhancement speech generation Speech processing Speech recognition speech signal Speech synthesis Tongue Verbal communication |
title | EPG2S: Speech Generation and Speech Enhancement Based on Electropalatography and Audio Signals Using Multimodal Learning |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T10%3A20%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=EPG2S:%20Speech%20Generation%20and%20Speech%20Enhancement%20Based%20on%20Electropalatography%20and%20Audio%20Signals%20Using%20Multimodal%20Learning&rft.jtitle=IEEE%20signal%20processing%20letters&rft.au=Chen,%20Li-Chin&rft.date=2022&rft.volume=29&rft.spage=2582&rft.epage=2586&rft.pages=2582-2586&rft.issn=1070-9908&rft.eissn=1558-2361&rft.coden=ISPLEM&rft_id=info:doi/10.1109/LSP.2022.3184636&rft_dat=%3Cproquest_ieee_%3E2757178258%3C/proquest_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c291t-9e40049c530cc535c1038715bb921e9701888bdfb918dc45f892674688af77e23%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2757178258&rft_id=info:pmid/&rft_ieee_id=9801623&rfr_iscdi=true |