Loading…
An attention-augmented driven modified two-fold U-net anomaly detection model for video surveillance systems
We propose an effective strategy for detecting and localizing anomalous behavior using a modified end-to-end two-stage encoder-decoder U-shaped network. By building the model from scratch for the detection, segmentation, and classification of an anomalous event in video sequences. The encoder model...
Saved in:
Published in: | Multimedia tools and applications 2024-03, Vol.83 (11), p.32019-32040 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | We propose an effective strategy for detecting and localizing anomalous behavior using a modified end-to-end two-stage encoder-decoder U-shaped network. By building the model from scratch for the detection, segmentation, and classification of an anomalous event in video sequences. The encoder model is helpful for feature extraction which is input to the bottleneck block. The resulting feature maps serve as input to the decoder path which is responsible for transposing feature maps to the original image. For precise localization of anomaly, we included the augmentation feature of both images and their masks in the proposed U-net model. In a two-stage U-net network, the first model is useful for the detection of video frames while the same U-net model is used in the second stage for augmentation of detected video frames from first model, which provides segmentation and classification of images. This precise symmetric path-based architecture is useful in good spatial localization of anomalous events. We apply a pixel-based threshold value for Intersection over Union score to distinguish the pixels. Pixels having values greater than the threshold are considered anomalous or otherwise normal with an IoU score of 0. We have evaluated our two-stage U-net model performance on three benchmark standard datasets and compared performance with the Conventional U-net model, and Attention-U-net models without augmentation features. Our method combines spatial details and deep features that yield an improved accuracy of 99.15%, a mean intersection over union score of 82.33 and 99% ROC values that are higher as compared to other methods. |
---|---|
ISSN: | 1573-7721 1380-7501 1573-7721 |
DOI: | 10.1007/s11042-023-16728-5 |