Loading…
A Study on Speech Enhancement Using Deep Temporal Convolutional Neural Network
More recently, the end-to-end and deep neural networking (DNN) architectures show the potential advantages of speech enhancement over various noise environments. However, the computational cost of such systems remains a matter of concern. Temporal convolutional net-works (TCN) constructed with a dee...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | More recently, the end-to-end and deep neural networking (DNN) architectures show the potential advantages of speech enhancement over various noise environments. However, the computational cost of such systems remains a matter of concern. Temporal convolutional net-works (TCN) constructed with a deep stack of dilated convolution blocks is a tremendous achievement among these DNN-based approaches. TCN can retain the information of the long-range dependencies on temporal patterns while not bursting the computational resources. Inspired by the successful development of the TCN, we investigate the fully-convolutional time-domain network which adopted the dilated 1-D convolutional blocks to estimate the masking function. The mask function is being applied to the output of the encoder to obtain the enhanced speech signals. Our proposed neural model is trained on 300 hours of speech data, corrupted by multiple noise types, directly with an objective function for maximizing the scale-invariant signal-to-noise ratio (SI-SNR). The experimental results showed that the voice quality and SI-SNR can be improved. The average SI-SNR is significantly increased by 7.4 dB for the 5 hours of noisy testing speech. The proposed method enjoy their advantages on end-to-end speech enhancement with the resource-constrained system. |
---|---|
ISSN: | 2575-8284 |
DOI: | 10.1109/ICCE-TW52618.2021.9602920 |