Loading…

Speech quality assessment with WARP‐Q: From similarity to subsequence dynamic time warp cost

Speech coding has been shown to achieve good speech quality using either waveform matching or parametric reconstruction. For very low bit rate streams, recently developed generative speech models can reconstruct high‐quality wideband speech from the bit streams of standard parametric encoders at les...

Full description

Saved in:
Bibliographic Details
Published in:IET signal processing 2022-12, Vol.16 (9), p.1050-1070
Main Authors: Jassim, Wissam A., Skoglund, Jan, Chinen, Michael, Hines, Andrew
Format: Article
Language:English
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Speech coding has been shown to achieve good speech quality using either waveform matching or parametric reconstruction. For very low bit rate streams, recently developed generative speech models can reconstruct high‐quality wideband speech from the bit streams of standard parametric encoders at less than 3 kb/s. Generative codecs produce high‐quality speech based on synthesising speech from a DNN and the parametric input. Existing objective speech quality models (e.g., ViSQOL and POLQA) cannot be used to accurately evaluate the quality of coded speech from generative models as they penalise based on signal differences not apparent in subjective listening test results. This paper presents WARP‐Q, a full‐reference objective speech quality metric that uses a dynamic time warping cost for MFCC representations of the signals. It is robust to low perceptual signal changes introduced by low bit rate neural vocoders. An evaluation using waveform matching, parametric, and generative neural vocoder‐based codecs as well as channel and environmental noise shows that WARP‐Q has better correlation and codec quality ranking for novel codecs compared to traditional metrics as well as the versatility of capturing other types of degradations, such as additive noise and transmission channel degradations.
ISSN:1751-9675
1751-9683
DOI:10.1049/sil2.12151