Loading…

In defence of metric learning for speaker recognition

The objective of this paper is 'open-set' speaker recognition of unseen speakers, where ideal embeddings should be able to condense information into a compact utterance-level representation that has small intra-speaker and large inter-speaker distance. A popular belief in speaker recogniti...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2020-04
Main Authors:	Joon Son Chung, Huh, Jaesung, Mun, Seongkyu, Lee, Minjae, Heo, Hee Soo, Choe, Soyeon, Ham, Chiheon, Jung, Sunghwan, Bong-Jin, Lee, Han, Icksang
Format:	Article
Language:	English
Subjects:	Classification Learning Speech recognition
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The objective of this paper is 'open-set' speaker recognition of unseen speakers, where ideal embeddings should be able to condense information into a compact utterance-level representation that has small intra-speaker and large inter-speaker distance. A popular belief in speaker recognition is that networks trained with classification objectives outperform metric learning methods. In this paper, we present an extensive evaluation of most popular loss functions for speaker recognition on the VoxCeleb dataset. We demonstrate that the vanilla triplet loss shows competitive performance compared to classification-based losses, and those trained with our proposed metric learning objective outperform state-of-the-art methods.
ISSN:	2331-8422
DOI:	10.48550/arxiv.2003.11982