Loading…

Sequential attention mechanism for weakly supervised video anomaly detection

Surveillance cameras are installed across various sectors of a smart city in order to capture ongoing events for monitoring purposes. The analysis of these surveillance videos is an important research topic that involves activity recognition, object detection, anomaly recognition, and other problems...

Full description

Saved in:
Bibliographic Details
Published in:Expert systems with applications 2023-11, Vol.230, p.120599, Article 120599
Main Authors: Ullah, Waseem, Min Ullah, Fath U, Ahmad Khan, Zulfiqar, Wook Baik, Sung
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Surveillance cameras are installed across various sectors of a smart city in order to capture ongoing events for monitoring purposes. The analysis of these surveillance videos is an important research topic that involves activity recognition, object detection, anomaly recognition, and other problems. However, anomaly recognition is the most common task in a smart city, and has received significant attention with the aim of ensuring public safety and security. Many works have been published in this field, but these schemes have not been able to provide the desired detection outcomes. Mainstream anomaly recognition methods are heavily dependent on strong supervision to achieve satisfactory performance, which is time-consuming and impractical. With a particular focus on this problem, this article presents a deep convolution neural network (CNN)-based novel anomaly recognition model, in which deep features are extracted from surveillance video frames. These features are forwarded to the proposed temporal convolution network (TCN) that includes a multi-head attention module to enable it to recognise anomalies from these videos. The multi-head temporal attention mechanism enables the model to obtain more key temporal information about the complex surveillance environment. Experiments conducted on standard datasets and a comparison with state-of-the-art approaches demonstrate the effectiveness and superiority of the proposed framework, which achieves increases in accuracy of 0.9%, 1.9%, 0.65%, 0.27%, and 1.5% on the UCF-Crime2local, LAD-2000, RWF-2000, RLVS, and Crowd Violence datasets, respectively. These outcomes indicate the suitability of our method for deployment in real-time surveillance schemes.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2023.120599