Loading…
Sequential attention mechanism for weakly supervised video anomaly detection
Surveillance cameras are installed across various sectors of a smart city in order to capture ongoing events for monitoring purposes. The analysis of these surveillance videos is an important research topic that involves activity recognition, object detection, anomaly recognition, and other problems...
Saved in:
Published in: | Expert systems with applications 2023-11, Vol.230, p.120599, Article 120599 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Surveillance cameras are installed across various sectors of a smart city in order to capture ongoing events for monitoring purposes. The analysis of these surveillance videos is an important research topic that involves activity recognition, object detection, anomaly recognition, and other problems. However, anomaly recognition is the most common task in a smart city, and has received significant attention with the aim of ensuring public safety and security. Many works have been published in this field, but these schemes have not been able to provide the desired detection outcomes. Mainstream anomaly recognition methods are heavily dependent on strong supervision to achieve satisfactory performance, which is time-consuming and impractical. With a particular focus on this problem, this article presents a deep convolution neural network (CNN)-based novel anomaly recognition model, in which deep features are extracted from surveillance video frames. These features are forwarded to the proposed temporal convolution network (TCN) that includes a multi-head attention module to enable it to recognise anomalies from these videos. The multi-head temporal attention mechanism enables the model to obtain more key temporal information about the complex surveillance environment. Experiments conducted on standard datasets and a comparison with state-of-the-art approaches demonstrate the effectiveness and superiority of the proposed framework, which achieves increases in accuracy of 0.9%, 1.9%, 0.65%, 0.27%, and 1.5% on the UCF-Crime2local, LAD-2000, RWF-2000, RLVS, and Crowd Violence datasets, respectively. These outcomes indicate the suitability of our method for deployment in real-time surveillance schemes. |
---|---|
ISSN: | 0957-4174 1873-6793 |
DOI: | 10.1016/j.eswa.2023.120599 |