Loading…

A Study on Speech Enhancement Using Deep Temporal Convolutional Neural Network

More recently, the end-to-end and deep neural networking (DNN) architectures show the potential advantages of speech enhancement over various noise environments. However, the computational cost of such systems remains a matter of concern. Temporal convolutional net-works (TCN) constructed with a dee...

Full description

Saved in:

Bibliographic Details
Main Authors:	Rana, Kuldeep Singh, Chen, Li-Wen, Tang, Li-Hsin, Hong, Wei-Tyng
Format:	Conference Proceeding
Language:	English
Subjects:	Convolution Convolutional neural networks Data models deep learning Linear programming Neural networks Noise measurement Speech enhancement temporal convolutional network
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	More recently, the end-to-end and deep neural networking (DNN) architectures show the potential advantages of speech enhancement over various noise environments. However, the computational cost of such systems remains a matter of concern. Temporal convolutional net-works (TCN) constructed with a deep stack of dilated convolution blocks is a tremendous achievement among these DNN-based approaches. TCN can retain the information of the long-range dependencies on temporal patterns while not bursting the computational resources. Inspired by the successful development of the TCN, we investigate the fully-convolutional time-domain network which adopted the dilated 1-D convolutional blocks to estimate the masking function. The mask function is being applied to the output of the encoder to obtain the enhanced speech signals. Our proposed neural model is trained on 300 hours of speech data, corrupted by multiple noise types, directly with an objective function for maximizing the scale-invariant signal-to-noise ratio (SI-SNR). The experimental results showed that the voice quality and SI-SNR can be improved. The average SI-SNR is significantly increased by 7.4 dB for the 5 hours of noisy testing speech. The proposed method enjoy their advantages on end-to-end speech enhancement with the resource-constrained system.
ISSN:	2575-8284
DOI:	10.1109/ICCE-TW52618.2021.9602920