Loading…

A Dual-Channel End-to-End Speech Enhancement Method Using Complex Operations in the Time Domain

This study investigates the utilization of complex operations to perform multichannel speech enhancement in the time domain using a neural network. Previous studies have demonstrated the advantages of incorporating complex operations when designing neural networks; however, they have solely focused...

Full description

Saved in:

Bibliographic Details
Published in:	Applied sciences 2023-07, Vol.13 (13), p.7698
Main Authors:	Pang, Jian, Li, Hongcheng, Jiang, Tao, Wang, Hui, Liao, Xiangning, Luo, Le, Liu, Hongqing
Format:	Article
Language:	English
Subjects:	Analysis Beamforming complex network Domain names dual channel end to end Fourier transforms Hilbert transformation Long short-term memory Methods Microphones Neural networks Noise reduction Signal processing Spatial data Speech speech enhancement Speech processing Waveforms
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This study investigates the utilization of complex operations to perform multichannel speech enhancement in the time domain using a neural network. Previous studies have demonstrated the advantages of incorporating complex operations when designing neural networks; however, they have solely focused on frequency-domain enhancement techniques. In contrast, our research study presents an end-to-end approach to perform speech enhancement in the time domain. We used the Hilbert transform to intelligently generate complex time-domain waveforms as inputs to the network. This allowed us to create an end-to-end approach that explores spatial information. To handle the complexity of the inputs, we developed a complex neural adaptive beamformer (CNAB). We utilized complex shared long short-term memory (LSTM), split-LSTM, and complex convolutions to generate the beamforming output. Following this, we developed a complex full convolutional network (CFCN) to enhance the beamforming output. We leveraged complex dilated convolutions to model the long-term temporal dependencies of speech. By cascading the CNAB and CFCN, we created the final end-to-end time-domain enhancement network, named CNABCFCN. We trained and tested CNABCFCN using the deep noise suppression (DNS) challenge dataset. Our results demonstrate the advantages of using complex operations over the baseline model. Furthermore, the proposed CNABCFCN performed better in terms of both objective and subjective measures compared with other networks.
ISSN:	2076-3417 2076-3417
DOI:	10.3390/app13137698