Loading…
Class-GE2E: Speaker Verification Using Self-Attention and Transfer Learning with Loss Combination
Recent studies prove that speaker verification performance improves by employing an attention mechanism compared to using temporal and statistical pooling techniques. This paper proposes an advanced multi-head attention method, which utilizes a sorted vector of the frame-level features to consider a...
Saved in:
Published in: | Electronics (Basel) 2022-03, Vol.11 (6), p.893 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Recent studies prove that speaker verification performance improves by employing an attention mechanism compared to using temporal and statistical pooling techniques. This paper proposes an advanced multi-head attention method, which utilizes a sorted vector of the frame-level features to consider a higher correlation. In this study, we also propose a transfer learning scheme to maximize the effectiveness of the two loss functions, which are the classifier-based cross entropy loss function and metric-based GE2E loss function, to learn the distance between embeddings. The sorted multi-head attention (SMHA) method outperforms the conventional attention methods showing 4.55% in equal error rate (EER). The proposed transfer learning scheme with Class-GE2E loss function significantly improved our attention-based systems. In particular, the EER of the SMHA decreased to 4.39% by employing transfer learning with Class-GE2E loss. The experimental results demonstrate that our effort to include a greater correlation between frame-level features for multi-head attention processing, and the combining of two different loss functions through transfer learning, is highly effective for improving speaker verification performance. |
---|---|
ISSN: | 2079-9292 2079-9292 |
DOI: | 10.3390/electronics11060893 |