Loading…
Clustering of spatial cues by semantic segmentation for anechoic binaural source separation
The recent introduction of neural networks to speech separation has dramatically boosted the separation performance. This paper presents a novel psychoacoustic approach for speech source separation in anechoic conditions, using semantic segmentation of the interaural spectrograms of the audio mixtur...
Saved in:
Published in: | Applied acoustics 2021-01, Vol.171, p.107566, Article 107566 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The recent introduction of neural networks to speech separation has dramatically boosted the separation performance. This paper presents a novel psychoacoustic approach for speech source separation in anechoic conditions, using semantic segmentation of the interaural spectrograms of the audio mixtures. We have trained two separate U-Nets (a specialized neural network for semantic segmentation) on the interaural level difference (ILD) spectrogram, and the interaural phase difference (IPD) spectrogram of a single source. After training, these U-Nets are used to predict the class of each time frequency (TF) unit of the interaural spectrogram of the audio mixture. The ILD and IPD soft masks obtained from these U-Nets are combined by a novel scheme which utilizes the strength of the interaural cues in different frequency bands. The results show improved separation over two state of the art machine learning source separation systems utilizing the same interaural cues. There is average improvement of 7.32 dB in signal to distortion ratio (SDR) and 0.3 points improvement in short term objective intelligibility (STOI) over degenerate un-mixing estimation technique (DUET) algorithm and 2.51 dB improvement in SDR with comparable intelligibility over model-based expectation–maximization source separation and localization (MESSL) algorithm. |
---|---|
ISSN: | 0003-682X 1872-910X |
DOI: | 10.1016/j.apacoust.2020.107566 |