Loading…
TRSD: A Time-Varying and Region-Changed Speech Database for Speaker Recognition
Performance degradation with intraspeaker variability is a hot topic in speaker recognition. Accuracy dropping over time has become a common and accepted phenomenon in the field of speaker recognition. In China, many people travel between their birthplace and workplaces. Different cultural atmospher...
Saved in:
Published in: | Circuits, systems, and signal processing systems, and signal processing, 2022-07, Vol.41 (7), p.3931-3956 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Performance degradation with intraspeaker variability is a hot topic in speaker recognition. Accuracy dropping over time has become a common and accepted phenomenon in the field of speaker recognition. In China, many people travel between their birthplace and workplaces. Different cultural atmospheres and customs have an effect on the pronunciation of speech. The ongoing work focuses on time-varying and region-changed factors that are caused by population migration. This paper introduces a time-varying and region-changed speech database (TRSD) collected from 55 university students over 3 years. In total, it contains 3795 utterances. To study the impact of the time-varying and region-changed factors on speaker identification and explore hidden factors that may lead to performance degradation, there are also many experimental studies for the database. In the experiments, the changes in characteristic parameters (pitch, intensity, formant and spectrogram) are analyzed and grouped by gender and birthplace. The Gaussian mixture model-universal background model, deep neural network model, i-vector/PLDA and x-vector/PLDA are evaluated on TRSD to provide a reference performance. For the time-varying and region-changed factors, this paper also provided three kinds of corresponding solutions: speaker model adaption, cepstral mean normalization and mel-frequency cepstrum coefficient normalization. |
---|---|
ISSN: | 0278-081X 1531-5878 |
DOI: | 10.1007/s00034-022-01964-1 |