Loading…
DeFT-AN: Dense Frequency-Time Attentive Network for Multichannel Speech Enhancement
In this study, we propose a dense frequency-time attentive network (DeFT-AN) for multichannel speech enhancement. DeFT-AN is a mask estimation network that predicts a complex spectral masking pattern for suppress-ing the noise and reverberation embedded in the short-time Fourier transform (STFT) of...
Saved in:
Published in: | IEEE signal processing letters 2023, Vol.30, p.1-5 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In this study, we propose a dense frequency-time attentive network (DeFT-AN) for multichannel speech enhancement. DeFT-AN is a mask estimation network that predicts a complex spectral masking pattern for suppress-ing the noise and reverberation embedded in the short-time Fourier transform (STFT) of an input signal. The proposed mask estimation network incorporates three different types of blocksfor aggregatinginformationin thespatial, spectral, and temporal dimensions. It utilizes a spectral transformer with a modified feed-forward network and a temporal con-former with sequential dilated convolutions. The use of dense blocks and transformers dedicated to the three differ-ent characteristics of audio signals enables more compre-hensive enhancement in noisy and reverberant environ-ments. The remarkable performance of DeFT-AN over state-of-the-art multichannel models is demonstrated based on two popular noisy and reverberant datasets in terms of various metrics for speech quality and intelligibility. |
---|---|
ISSN: | 1070-9908 1558-2361 |
DOI: | 10.1109/LSP.2023.3244428 |