Loading…

Non-intrusive speech quality assessment using multi-resolution auditory model features for degraded narrowband speech

A multi-resolution framework using auditory perception-based wavelet packet transform is invoked in multi-resolution auditory model (MRAM) and used for non-intrusive objective speech quality estimation. The MRAM provides a detailed time-frequency modelling of the human auditory system compared to ea...

Full description

Saved in:
Bibliographic Details
Published in:IET signal processing 2015-12, Vol.9 (9), p.638-646
Main Authors: Dubey, Rajesh Kumar, Kumar, Arun
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:A multi-resolution framework using auditory perception-based wavelet packet transform is invoked in multi-resolution auditory model (MRAM) and used for non-intrusive objective speech quality estimation. The MRAM provides a detailed time-frequency modelling of the human auditory system compared to earlier models that have been used for non-intrusive speech quality estimation. The objective Mean Opinion Score (MOS) of a degraded narrowband speech utterance has been estimated by Gaussian Mixture Model (GMM) probabilistic approach using MRAM-based feature vector. Additionally, a recent auditory model (Lyons’ auditory model) based features, mel-frequency cepstral coefficients (MFCC), and line spectral frequencies (LSF) features have also been used independently for comparison of the performance of MRAM features. The combination of MFCC and LSF features with MRAM features for non-intrusive speech quality estimation using GMM probabilistic approach has been proposed and investigated. The performance of these feature vectors has been evaluated and compared with ITU-T Recommendation P.563 and a recent published work by computing correlation coefficient and root-mean-square error between the subjective MOS and the estimated objective MOS. It is found that the proposed method that uses a combination of MRAM features, MFCC, and LSF feature vectors for non-intrusive speech quality performs better than both the other algorithms.
ISSN:1751-9675
1751-9683
1751-9683
DOI:10.1049/iet-spr.2014.0214