Loading…

Sound Event Detection by Consistency Training and Pseudo-Labeling With Feature-Pyramid Convolutional Recurrent Neural Networks

Due to the high cost of large-scale strong labeling, sound event detection (SED) using only weakly-labeled and unlabeled data has drawn increasing attention in recent years. To exploit large amount of unlabeled in-domain data efficiently, we applied three semi-supervised learning strategies: interpo...

Full description

Saved in:

Bibliographic Details
Main Authors:	Koh, Chih-Yuan, Chen, You-Siang, Liu, Yi-Wen, Bai, Mingsian R.
Format:	Conference Proceeding
Language:	English
Subjects:	Convolution CRNN Event detection feature pyramid Performance gain Recurrent neural networks semi-supervised learning Semisupervised learning Sound event detection Training Transforms weakly super-vised learning
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Due to the high cost of large-scale strong labeling, sound event detection (SED) using only weakly-labeled and unlabeled data has drawn increasing attention in recent years. To exploit large amount of unlabeled in-domain data efficiently, we applied three semi-supervised learning strategies: interpolation consistency training (ICT), shift consistency training (SCT), and weakly pseudo-labeling. In addition, we propose FP-CRNN, a convolutional recurrent neural network (CRNN) which contains feature-pyramid (FP) components, to leverage temporal information by utilizing features at different scales. Experiments were conducted on DCASE 2020 task 4. In terms of event-based F-measure, these approaches outperform the official baseline system, at 34.8%, with the highest F-measure of 48.0% achieved by an FP-CRNN that was trained with the combination of all three strategies.
ISSN:	2379-190X
DOI:	10.1109/ICASSP39728.2021.9414350