Loading…
DTE-Net: Dual Temporal Excitation Network for Video Violence Recognition
Video-based violence recognition has become a crucial topic owing to the development of surveillance cameras. However, with the extra temporal dimension and no precision range of violent video data, violence recognition is a challenging problem. In this study, we propose a dual temporal excitation n...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Video-based violence recognition has become a crucial topic owing to the development of surveillance cameras. However, with the extra temporal dimension and no precision range of violent video data, violence recognition is a challenging problem. In this study, we propose a dual temporal excitation network (DTE-Net) consisting of a shift temporal adaptive module (STAM) and a sparse object interaction transformer (SOI-Tr) module. The STAM extracts coarse-grained local and global temporal information by fusing shift module with temporal adaptive modeling module. The SOI-Tr module utilizes important object attention to excite fine-grained global temporal representation reasoning. In addition, we create a multi-class violence (MCV) dataset of video clips extracted from real-world scenes to address the limitation of poorly diversified categories in most existing violence datasets. Finally, we also conduct extensive experiments on five violence datasets, including the MCV, and the results show that our network outperforms state-of-the-art performance. |
---|---|
ISSN: | 1945-788X |
DOI: | 10.1109/ICME52920.2022.9859986 |