Loading…
Exploiting Discrete Wavelet Transform Features in Speech Enhancement Technique Adaptive FullSubNet
FullSubNet+ and its adaptive counterpart, adaptive-FSN, are well-known neural network-based speech enhancement (SE) approaches that use a full-band and sub-band fusion model to perform the SE job. The magnitude and complex spectrogram are used in these methods to learn the network, which consists mo...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | FullSubNet+ and its adaptive counterpart, adaptive-FSN, are well-known neural network-based speech enhancement (SE) approaches that use a full-band and sub-band fusion model to perform the SE job. The magnitude and complex spectrogram are used in these methods to learn the network, which consists mostly of multi-scale time-sensitive channel attention (MulCA) modules and stacked temporal convolution network (TCN) blocks. To capture the phase information of input time-domain signals more simply, we propose using discrete wavelet transform (DWT) features as an input source instead of the complex spectrogram to develop these SE models. The preliminary experiments with the VoiceBank-DEMAND task show that utilizing DWT features in adaptive-FSN achieves higher objective speech quality and intelligibility in terms of PESQ and STOI metric scores for the test set when compared to the original adaptive-FSN arrangement. |
---|---|
ISSN: | 2575-8284 |
DOI: | 10.1109/ICCE-Taiwan58799.2023.10226809 |