Loading…

Non-intrusive speech quality assessment with attention-based ResNet-BiLSTM

Speech quality is frequently affected by a variety factors in online conferencing applications, such as background noise, reverberation, packet loss and network jitter. In real scenarios, it is impossible to obtain a clean reference signal for evaluating the quality of the conferencing speech. There...

Full description

Saved in:
Bibliographic Details
Published in:Signal, image and video processing image and video processing, 2023-10, Vol.17 (7), p.3377-3385
Main Authors: Shen, Kailai, Yan, Diqun, Ye, Zhe, Xu, Xianbo, Gao, JinXing, Dong, Li, Peng, Chengbin, Yang, Kun
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Speech quality is frequently affected by a variety factors in online conferencing applications, such as background noise, reverberation, packet loss and network jitter. In real scenarios, it is impossible to obtain a clean reference signal for evaluating the quality of the conferencing speech. Therefore, an effective non-intrusive speech quality assessment (NISQA) method is necessary. In this paper, we propose a new network framework for NISQA based on ResNet and BiLSTM. ResNet is utilized to extract local features, while BiLSTM is used to integrate representative features with long-term time dependencies and sequential characteristics. Considering that ResNet may result in the loss of context information when applied to the NISQA task, we propose a variant of ResNet which can preserve the time series information of the conferencing speech. The experimental results demonstrate that the proposed method has a high correlation with the mean opinion score of clean, noisy and processed speech.
ISSN:1863-1703
1863-1711
DOI:10.1007/s11760-023-02559-2