Loading…

TransCNN: Hybrid CNN and transformer mechanism for surveillance anomaly detection

Surveillance video anomaly detection (SVAD) is a challenging task due to the variations in object scale, discrimination and unexpected events, the impact of the background, and the wide range of definitions of anomalous events in different surveillance contexts. In this work, we introduce an end-to-...

Full description

Saved in:

Bibliographic Details
Published in:	Engineering applications of artificial intelligence 2023-08, Vol.123, p.106173, Article 106173
Main Authors:	Ullah, Waseem, Hussain, Tanveer, Ullah, Fath U Min, Lee, Mi Young, Baik, Sung Wook
Format:	Article
Language:	English
Subjects:	Anomaly recognition Artificial intelligence Big data Machine learning Surveillance videos Vision transformer
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Surveillance video anomaly detection (SVAD) is a challenging task due to the variations in object scale, discrimination and unexpected events, the impact of the background, and the wide range of definitions of anomalous events in different surveillance contexts. In this work, we introduce an end-to-end hybrid convolution neural network (CNN) and vision transformer-based framework for anomaly detection. The proposed framework uses spatial and temporal information from a surveillance video to detect anomalous events and operates in two steps: in the first step, an efficient backbone CNN model is used for spatial feature extraction, while in the second step, these features are passed from the transformer-based model to learn the long-term temporal relationships between various complex surveillance events. The features from the backbone model are fed to a sequential learning model in which temporal self-attention is utilised to generate an attention map; this allows the proposed framework to learn the spatiotemporal features effectively and to detect anomalous events. Our experimental results on various benchmark VAD datasets prove the validity of the proposed framework, which outperforms other state-of-the-art approaches by achieving high AUC values of 94.6%, 98.4%, and 89.6% on the ShanghaiTech, UCSD Ped2 and CUHK avenue datasets, respectively.
ISSN:	0952-1976 1873-6769
DOI:	10.1016/j.engappai.2023.106173