Loading…

TRSD: A Time-Varying and Region-Changed Speech Database for Speaker Recognition

Performance degradation with intraspeaker variability is a hot topic in speaker recognition. Accuracy dropping over time has become a common and accepted phenomenon in the field of speaker recognition. In China, many people travel between their birthplace and workplaces. Different cultural atmospher...

Full description

Saved in:
Bibliographic Details
Published in:Circuits, systems, and signal processing systems, and signal processing, 2022-07, Vol.41 (7), p.3931-3956
Main Authors: Li, Dongdong, Liu, Jinlin, Wang, Zhe, Li, Yanqiong, Chen, Baijun, Cai, Lizhi
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Performance degradation with intraspeaker variability is a hot topic in speaker recognition. Accuracy dropping over time has become a common and accepted phenomenon in the field of speaker recognition. In China, many people travel between their birthplace and workplaces. Different cultural atmospheres and customs have an effect on the pronunciation of speech. The ongoing work focuses on time-varying and region-changed factors that are caused by population migration. This paper introduces a time-varying and region-changed speech database (TRSD) collected from 55 university students over 3 years. In total, it contains 3795 utterances. To study the impact of the time-varying and region-changed factors on speaker identification and explore hidden factors that may lead to performance degradation, there are also many experimental studies for the database. In the experiments, the changes in characteristic parameters (pitch, intensity, formant and spectrogram) are analyzed and grouped by gender and birthplace. The Gaussian mixture model-universal background model, deep neural network model, i-vector/PLDA and x-vector/PLDA are evaluated on TRSD to provide a reference performance. For the time-varying and region-changed factors, this paper also provided three kinds of corresponding solutions: speaker model adaption, cepstral mean normalization and mel-frequency cepstrum coefficient normalization.
ISSN:0278-081X
1531-5878
DOI:10.1007/s00034-022-01964-1