Loading…
On the Robustness of Deep Features for Audio Event Classification in Adverse Environments
Deep features, responses to complex input patterns learned within deep neural networks, have recently shown great performance in image recognition tasks, motivating their use for audio analysis tasks as well. These features provide multiple levels of abstraction which permit to select a sufficiently...
Saved in:
Main Authors: | , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Deep features, responses to complex input patterns learned within deep neural networks, have recently shown great performance in image recognition tasks, motivating their use for audio analysis tasks as well. These features provide multiple levels of abstraction which permit to select a sufficiently generalized layer to identify classes not seen during training. The generalization capability of such features is very useful due to the lack of complete labeled audio datasets. However, as opposed to classical hand-crafted features such as Mel-frequency cepstral coefficients (MFCCs), the performance impact of having an acoustically adverse environment has not been evaluated in detail. In this paper, we analyze the robustness of deep features under adverse conditions such as noise, reverberation and segmentation errors. The selected features are extracted from SoundNet, a deep convolutional neural network (CNN) for audio classification tasks using raw audio segments as input. The results show that the performance is severely affected by noise and reverberation, with room for improvement in terms of robustness to different kinds of acoustic scenarios. |
---|---|
ISSN: | 2164-5221 |
DOI: | 10.1109/ICSP.2018.8652438 |