Loading…

Kullback–Leibler divergence and sample skewness for pathological voice quality assessment

•A voice pathology detection system grounded on simple and clear-cut foundations.•The features are designed precisely in terms of voice pathologies effects upon the speech signal.•The system delivers high accuracy with a low number of parameters. This paper proposes new features aiming to improve th...

Full description

Saved in:
Bibliographic Details
Published in:Biomedical signal processing and control 2020-03, Vol.57, p.101697, Article 101697
Main Authors: Barreira, Ramiro R.A., Ling, Lee Luan
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•A voice pathology detection system grounded on simple and clear-cut foundations.•The features are designed precisely in terms of voice pathologies effects upon the speech signal.•The system delivers high accuracy with a low number of parameters. This paper proposes new features aiming to improve the performance of an automatic voice pathology detection system. The features are designed precisely in terms of voice pathologies effects upon the speech signal. The system is intended to deliver high accuracy with a low number of parameters. Kullback–Leibler divergence (KLD) applied to consecutive frames of the speech signal provides a measure of voice instability. In this work, the KLD is applied to frame’s histogram and a modified form of its spectrum named higher amplitude suppression spectrum (HASS). The H-KLD (histogram KLD) and the HASS-KLD are two of the three features presently approached. An additional feature that provides the level of damping of the voice pitch period waveform is proposed, the short-term sample skewness of the signal. The H-KLD, the HASS-KLD, and the sample skewness are features employed along with mel-frequency cepstral coefficients (MFCC) in a voice pathology detection system. The system is composed of a Gaussian mixture models (GMM) classifier and two generalized extreme value (GEV) distribution classifiers. They are fused by means of a Gaussian naïve Bayes classifier. A standard subset of the Massachusetts Eye and Ear Infirmary (MEEI) voice disorders database is adopted for evaluating the system. The obtained global success rate of 99.55% shows that the proposed features are suitable for pathological voice quality assessment.
ISSN:1746-8094
1746-8108
DOI:10.1016/j.bspc.2019.101697