Loading…
Mixed T-domain and TF-domain Magnitude and Phase representations for GAN-based speech enhancement
Deep learning has made significant advancements in speech enhancement, which plays a crucial role in improving the quality of speech signals in noisy conditions. In this paper, we propose a new approach called M-DGAN, which introduces a time (T)-domain encoder-decoder structure with rich channel rep...
Saved in:
Published in: | Scientific reports 2024-07, Vol.14 (1), p.17698-13, Article 17698 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Deep learning has made significant advancements in speech enhancement, which plays a crucial role in improving the quality of speech signals in noisy conditions. In this paper, we propose a new approach called M-DGAN, which introduces a time (T)-domain encoder-decoder structure with rich channel representations into the time-frequency (TF)-domain generator framework, resulting in a new generator structure with mixed magnitude and phase representations in the T and TF-domains. The proposed mixed T-domain and TF-domain generator, incorporating the cascaded reworked conformer (CRC) structure, exhibits improved modeling capability and adaptability. Test results on the Voice Bank + DEMAND public dataset show that our method achieves the highest score with
P
S
E
Q
=
3.52
and performs well on all the remaining metrics when compared to the current state-of-the-art methods. In addition, tests on the NISQA_TEST_LIVETALK real dataset of the NISQA Corpus show the breadth and robustness of our model on speech enhancement tasks. |
---|---|
ISSN: | 2045-2322 2045-2322 |
DOI: | 10.1038/s41598-024-68708-w |