Loading…

On the Robustness of Deep Features for Audio Event Classification in Adverse Environments

Deep features, responses to complex input patterns learned within deep neural networks, have recently shown great performance in image recognition tasks, motivating their use for audio analysis tasks as well. These features provide multiple levels of abstraction which permit to select a sufficiently...

Full description

Saved in:
Bibliographic Details
Main Authors: Martin-Morato, Irene, Cobos, Maximo, Ferri, Francesc J.
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Deep features, responses to complex input patterns learned within deep neural networks, have recently shown great performance in image recognition tasks, motivating their use for audio analysis tasks as well. These features provide multiple levels of abstraction which permit to select a sufficiently generalized layer to identify classes not seen during training. The generalization capability of such features is very useful due to the lack of complete labeled audio datasets. However, as opposed to classical hand-crafted features such as Mel-frequency cepstral coefficients (MFCCs), the performance impact of having an acoustically adverse environment has not been evaluated in detail. In this paper, we analyze the robustness of deep features under adverse conditions such as noise, reverberation and segmentation errors. The selected features are extracted from SoundNet, a deep convolutional neural network (CNN) for audio classification tasks using raw audio segments as input. The results show that the performance is severely affected by noise and reverberation, with room for improvement in terms of robustness to different kinds of acoustic scenarios.
ISSN:2164-5221
DOI:10.1109/ICSP.2018.8652438