Loading…

Sound-Event Classification Using Robust Texture Features for Robot Hearing

Sound-event classification often utilizes time-frequency analysis, which produces an image-like spectrogram. Recent approaches such as spectrogram image features and subband power distribution image features extract the image local statistics such as mean and variance from the spectrogram. They have...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on multimedia 2017-03, Vol.19 (3), p.447-458
Main Authors:	Ren, Jianfeng, Jiang, Xudong, Yuan, Junsong, Magnenat-Thalmann, Nadia
Format:	Article
Language:	English
Subjects:	Auditory system Band-dependent local binary pattern band-independent local binary pattern Classification Electric power distribution Feature extraction Hearing Histograms Image classification multi-channel local binary pattern Noise Noise sensitivity Pixels robot hearing Robots Sound sound-event classification Spectrogram Spectrograms Texture Time-frequency analysis
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Sound-event classification often utilizes time-frequency analysis, which produces an image-like spectrogram. Recent approaches such as spectrogram image features and subband power distribution image features extract the image local statistics such as mean and variance from the spectrogram. They have demonstrated good performance. However, we argue that such simple image statistics cannot well capture the complex texture details of the spectrogram. Thus, we propose to extract the local binary pattern (LBP) from the logarithm of the Gammatone-like spectrogram. However, the LBP feature is sensitive to noise. After analyzing the spectrograms of sound events and the audio noise, we find that the magnitude of pixel differences, which is discarded by the LBP feature, carries important information for sound-event classification. We thus propose a multichannel LBP feature via pixel difference quantization to improve the robustness to the audio noise. In view of the differences between spectrograms and natural images, and the reliability issues of LBP features, we propose two projection-based LBP features to better capture the texture information of the spectrogram. To validate the proposed multichannel projection-based LBP features for robot hearing, we have built a new sound-event classification database, the NTU-SEC database, in the context of social interaction between human and robot. It is publicly available to promote research on sound-event classification in a social context. The proposed approaches are compared with the state of the art on the RWCP database and the NTU-SEC database. They consistently demonstrate superior performance under various noise conditions.
ISSN:	1520-9210 1941-0077
DOI:	10.1109/TMM.2016.2618218