Loading…

Assessing the performances of different neural network architectures for the detection of screams and shouts in public transportation

•Models including temporal decoding deal better with the temporal information.•Vocal sounds are those which benefit the most from temporal information.•Decomposing the acoustic environment helps classifying shouts and speech sounds.•Real-world cases impose further constraint which affects the system...

Full description

Saved in:
Bibliographic Details
Published in:Expert systems with applications 2019-03, Vol.117, p.29-41
Main Authors: Laffitte, Pierre, Wang, Yun, Sodoyer, David, Girin, Laurent
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•Models including temporal decoding deal better with the temporal information.•Vocal sounds are those which benefit the most from temporal information.•Decomposing the acoustic environment helps classifying shouts and speech sounds.•Real-world cases impose further constraint which affects the system’s performance. As intelligent transportation systems are becoming more and more prevalent, the relevance of automatic surveillance systems grows larger. While such systems rely heavily on video signals, other types of signals can be used as well to monitor the security of passengers. The present article proposes an audio-based intelligent system for surveillance in public transportation, investigating the use of some state-of-the-art artificial intelligence methods for the automatic detection of screams and shouts. We present test results produced on a database of sounds occurring in subway trains in real working conditions, by classifying sounds into screams, shouts and other categories using different Neural Network architectures. The relevance of these architectures in the analysis of audio signals is analyzed. We report encouraging results, given the difficulty of the task, especially when a high level of surrounding noise is present.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2018.08.052