Loading…
Using combined features to improve speaker verification in the face of limited reverberant data
Automatic speaker recognition has garnered significant attention in research, displaying impressive performance in matched conditions where training and testing environments are similar. However, the system’s efficacy diminishes considerably when confronted with mismatched conditions, such as noise...
Saved in:
Published in: | International journal of speech technology 2023-09, Vol.26 (3), p.789-799 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Automatic speaker recognition has garnered significant attention in research, displaying impressive performance in matched conditions where training and testing environments are similar. However, the system’s efficacy diminishes considerably when confronted with mismatched conditions, such as noise and reverberation. The extraction of features plays a pivotal role in determining the speaker recognition system’s overall performance. Gammatone Frequency Cepstral Coefficients (GFCCs) have emerged as a commonly employed method for feature extraction in this domain. GFCCs exhibit robustness in handling environmental variations, encompassing diverse speaking styles and languages. Nevertheless, they prove sensitive to background conditions, including noise and reverberation, leading to a significant decline in system performance. A novel “Entrocy” feature has been proposed in response to this challenge. Entrocy is the Fourier Transform of Entrocy and aims to estimate the variation of information (or entropy) within an audio segment over time. A composite feature vector is formed by combining the Entrocy feature with GFCCs. The performance of this proposed approach was meticulously assessed using the i-vector PLDA baseline speaker recognition systems. Notably, the Entrocy feature consistently outperforms the well-established GFCC features, exhibiting robustness in these challenging conditions. Experiments conducted on speaker verification in controlled environments reveal that the speaker verification system can deliver high performance if the reverberation time does not exceed 1.0 s. Moreover, the speech samples need to be longer than 5 s are. These results highlight the effectiveness of the proposed method in reducing the equal error rate and improving the detection error trade-off, ultimately enhancing the system’s overall accuracy. |
---|---|
ISSN: | 1381-2416 1572-8110 |
DOI: | 10.1007/s10772-023-10048-7 |