Loading…
TransCNN: Hybrid CNN and transformer mechanism for surveillance anomaly detection
Surveillance video anomaly detection (SVAD) is a challenging task due to the variations in object scale, discrimination and unexpected events, the impact of the background, and the wide range of definitions of anomalous events in different surveillance contexts. In this work, we introduce an end-to-...
Saved in:
Published in: | Engineering applications of artificial intelligence 2023-08, Vol.123, p.106173, Article 106173 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Surveillance video anomaly detection (SVAD) is a challenging task due to the variations in object scale, discrimination and unexpected events, the impact of the background, and the wide range of definitions of anomalous events in different surveillance contexts. In this work, we introduce an end-to-end hybrid convolution neural network (CNN) and vision transformer-based framework for anomaly detection. The proposed framework uses spatial and temporal information from a surveillance video to detect anomalous events and operates in two steps: in the first step, an efficient backbone CNN model is used for spatial feature extraction, while in the second step, these features are passed from the transformer-based model to learn the long-term temporal relationships between various complex surveillance events. The features from the backbone model are fed to a sequential learning model in which temporal self-attention is utilised to generate an attention map; this allows the proposed framework to learn the spatiotemporal features effectively and to detect anomalous events. Our experimental results on various benchmark VAD datasets prove the validity of the proposed framework, which outperforms other state-of-the-art approaches by achieving high AUC values of 94.6%, 98.4%, and 89.6% on the ShanghaiTech, UCSD Ped2 and CUHK avenue datasets, respectively. |
---|---|
ISSN: | 0952-1976 1873-6769 |
DOI: | 10.1016/j.engappai.2023.106173 |