Loading…

Multi-Channel Speaker Verification with Conv-Tasnet Based Beamformer

We focus on the problem of speaker recognition in far-field multichannel data. The main contribution is introducing an alternative way of predicting spatial covariance matrices (SCMs) for a beamformer from the time domain signal. We propose to use ConvTasNet, a well-known source separation model, an...

Full description

Saved in:
Bibliographic Details
Main Authors: Mosner, Ladislav, Plchot, Oldrich, Burget, Lukas, Cernocky, Jan Honza
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We focus on the problem of speaker recognition in far-field multichannel data. The main contribution is introducing an alternative way of predicting spatial covariance matrices (SCMs) for a beamformer from the time domain signal. We propose to use ConvTasNet, a well-known source separation model, and we adapt it to perform speech enhancement by forcing it to separate speech and additive noise. We experiment with using the STFT of Conv-TasNet outputs to obtain SCMs of speech and noise, and finally, we fine-tune this multi-channel frontend w.r.t. speaker verification objective. We successfully tackle the problem of the lack of a realistic multichannel training set by using simulated data of MultiSV corpus. The analysis is performed on its retransmitted and simulated test parts. We achieve consistent improvements with a 2.7 times smaller model than the baseline based on a scheme with mask estimating NN.
ISSN:2379-190X
DOI:10.1109/ICASSP43922.2022.9747771