Loading…

A Human Auditory Perception Loss Function Using Modified Bark Spectral Distortion for Speech Enhancement

Human listeners often have difficulties understanding speech in the presence of background noise in daily speech communication environments. Recently, deep neural network (DNN)-based techniques have been successfully applied to speech enhancement and achieved significant improvements over the conven...

Full description

Saved in:
Bibliographic Details
Published in:Neural processing letters 2020-06, Vol.51 (3), p.2945-2957
Main Authors: Shu, Xiaofeng, Zhou, Yi, Liu, Hongqing, Truong, Trieu-Kien
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Human listeners often have difficulties understanding speech in the presence of background noise in daily speech communication environments. Recently, deep neural network (DNN)-based techniques have been successfully applied to speech enhancement and achieved significant improvements over the conventional approaches. However, existing DNN-based methods usually minimize the log-power spectral-based or the masking-based mean squared error (MSE) between the enhanced output and the training target (e.g., the ideal ratio mask (IRM) of the clean speech), which is not closely related to human auditory perception. In this letter, a modified bark spectral distortion loss function, which can be considered as an auditory perception-based MSE, is proposed to replace the conventional MSE in DNN-based speech enhancement approaches to further improve the objective perceptual quality. Experimental results reveal that the proposed method can obtain improved speech enhancement performance, especially in terms of objective perceptual quality in all experimental settings when compared with the DNN-based methods using the conventional MSE criterion.
ISSN:1370-4621
1573-773X
DOI:10.1007/s11063-020-10212-z