Loading…

Domain adaptation towards speaker-independent ultrasound tongue imaging based articulatory-to-acoustic conversion

In this paper, we endeavor to address an articulatory-to-acoustic issue which aims to estimate the mel-spectrogram of the acoustical signals, using midsagittal ultrasound tongue images of the vocal tract as input. Previous attempts employed statistical methods for the inversion between the articulat...

Full description

Saved in:
Bibliographic Details
Published in:The Journal of the Acoustical Society of America 2023-03, Vol.153 (3_supplement), p.A366-A366
Main Authors: You, Kang, Xu, Kele, Wang, Jilong, Feng, Ming
Format: Article
Language:English
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In this paper, we endeavor to address an articulatory-to-acoustic issue which aims to estimate the mel-spectrogram of the acoustical signals, using midsagittal ultrasound tongue images of the vocal tract as input. Previous attempts employed statistical methods for the inversion between the articulatory movements and speech, while deep learning has begun to dominate this field. Despite the sustainable efforts that have been made, the mapping performance can be greatly varied for different speakers and most of the previous methods are constrained for the speaker-dependent scenario. Here, we present a novel approach towards speaker-independent mapping, which is inspired by the domain adaptation method. Specifically, we explore decoupling the spectrogram generation task and the speaker recognition task. Leveraging a novel designed loss function, we can improve the performance under the speaker-independent scenarios, through the adversarial learning strategy. To demonstrate the effectiveness of the proposed method, extensive experiments are conducted on the Tongue and Lips (TaL) corpus. Objective evaluation is conducted to compare the generated spectrograms and ground truth, using three evaluation metrics, including the MSE, SSIM, and CW-SSIM. The results indicate that our proposed method can achieve superior performance under the speaker-independent scenario, compared with competitive solutions. Our code is available at https://github.com/xianyi11/Articulatory-to-Acoustic-with-Domain-Adaptation.
ISSN:0001-4966
1520-8524
DOI:10.1121/10.0019181