Loading…

Temporal Relation-Embedded Transformer for Weakly-Supervised Video Anomaly Detection

Detecting anomalies in surveillance videos using Weakly-Supervised Learning (WSL) poses a significant challenge in the computer vision community. Recent works on WSL have explored the application of Multiple Instance Learning (MIL) for video anomaly detection, where a model is trained to generate fr...

Full description

Saved in:
Bibliographic Details
Main Authors: Chowdhury, Tonmoyee, Dev, Prabhu Prasad, Baliarsingh, Santos Kumar, Mohanty, Jasaswi Prasad, Biswal, Manas Ranjan
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Detecting anomalies in surveillance videos using Weakly-Supervised Learning (WSL) poses a significant challenge in the computer vision community. Recent works on WSL have explored the application of Multiple Instance Learning (MIL) for video anomaly detection, where a model is trained to generate frame-level anomaly scores in positive and negative bags. However, these techniques can not effectively capture the temporal relationships between the video segments. Furthermore, there is a lack of sufficient discriminative features to distinguish between normal and abnormal segments. To address this challenge, we propose a novel long-term temporal dependency framework. This framework incorporates a SlowFast Network (SFNet) for feature discrimination and a FlowTransformer to focus on temporal dependencies, considering both historical data and current information. Moreover, we leverage a snippet-level classifier to assess the score of anomalous and normal videos. Comprehensive experiments on two benchmark datasets namely ShanghaiTech and UCF-Crime show that our method surpasses the performance of existing unsupervised and weakly-supervised methods. Our proposed method achieves an AUC of 83.5% on the UCF-Crime dataset and 90.21% on the ShanghaiTech dataset. Furthermore, it outperforms existing methods with a minimum of 1.33% on the UCF-Crime dataset and 1.32% on the ShanghaiTech dataset in terms of FAR.
ISSN:2688-769X
DOI:10.1109/SPIN60856.2024.10512312