Loading…

Using combined features to improve speaker verification in the face of limited reverberant data

Automatic speaker recognition has garnered significant attention in research, displaying impressive performance in matched conditions where training and testing environments are similar. However, the system’s efficacy diminishes considerably when confronted with mismatched conditions, such as noise...

Full description

Saved in:
Bibliographic Details
Published in:International journal of speech technology 2023-09, Vol.26 (3), p.789-799
Main Authors: Al-Karawi, Khamis A., Mohammed, Duraid Y.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Automatic speaker recognition has garnered significant attention in research, displaying impressive performance in matched conditions where training and testing environments are similar. However, the system’s efficacy diminishes considerably when confronted with mismatched conditions, such as noise and reverberation. The extraction of features plays a pivotal role in determining the speaker recognition system’s overall performance. Gammatone Frequency Cepstral Coefficients (GFCCs) have emerged as a commonly employed method for feature extraction in this domain. GFCCs exhibit robustness in handling environmental variations, encompassing diverse speaking styles and languages. Nevertheless, they prove sensitive to background conditions, including noise and reverberation, leading to a significant decline in system performance. A novel “Entrocy” feature has been proposed in response to this challenge. Entrocy is the Fourier Transform of Entrocy and aims to estimate the variation of information (or entropy) within an audio segment over time. A composite feature vector is formed by combining the Entrocy feature with GFCCs. The performance of this proposed approach was meticulously assessed using the i-vector PLDA baseline speaker recognition systems. Notably, the Entrocy feature consistently outperforms the well-established GFCC features, exhibiting robustness in these challenging conditions. Experiments conducted on speaker verification in controlled environments reveal that the speaker verification system can deliver high performance if the reverberation time does not exceed 1.0 s. Moreover, the speech samples need to be longer than 5 s are. These results highlight the effectiveness of the proposed method in reducing the equal error rate and improving the detection error trade-off, ultimately enhancing the system’s overall accuracy.
ISSN:1381-2416
1572-8110
DOI:10.1007/s10772-023-10048-7