Loading…

Exploiting Discrete Wavelet Transform Features in Speech Enhancement Technique Adaptive FullSubNet

FullSubNet+ and its adaptive counterpart, adaptive-FSN, are well-known neural network-based speech enhancement (SE) approaches that use a full-band and sub-band fusion model to perform the SE job. The magnitude and complex spectrogram are used in these methods to learn the network, which consists mo...

Full description

Saved in:
Bibliographic Details
Main Authors: Wu, Zong-Tai, Li, Pei-Fang, Wu, Ping-Chen, Li, Eric S., Hung, Jeih-Weih
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:FullSubNet+ and its adaptive counterpart, adaptive-FSN, are well-known neural network-based speech enhancement (SE) approaches that use a full-band and sub-band fusion model to perform the SE job. The magnitude and complex spectrogram are used in these methods to learn the network, which consists mostly of multi-scale time-sensitive channel attention (MulCA) modules and stacked temporal convolution network (TCN) blocks. To capture the phase information of input time-domain signals more simply, we propose using discrete wavelet transform (DWT) features as an input source instead of the complex spectrogram to develop these SE models. The preliminary experiments with the VoiceBank-DEMAND task show that utilizing DWT features in adaptive-FSN achieves higher objective speech quality and intelligibility in terms of PESQ and STOI metric scores for the test set when compared to the original adaptive-FSN arrangement.
ISSN:2575-8284
DOI:10.1109/ICCE-Taiwan58799.2023.10226809