Loading…

DTE-Net: Dual Temporal Excitation Network for Video Violence Recognition

Video-based violence recognition has become a crucial topic owing to the development of surveillance cameras. However, with the extra temporal dimension and no precision range of violent video data, violence recognition is a challenging problem. In this study, we propose a dual temporal excitation n...

Full description

Saved in:
Bibliographic Details
Main Authors: Yan, Wenwei, Wang, Haoxiang, Liu, Qing, Xuan, Jun, Tang, Yuxuan, Mao, Aihua
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Video-based violence recognition has become a crucial topic owing to the development of surveillance cameras. However, with the extra temporal dimension and no precision range of violent video data, violence recognition is a challenging problem. In this study, we propose a dual temporal excitation network (DTE-Net) consisting of a shift temporal adaptive module (STAM) and a sparse object interaction transformer (SOI-Tr) module. The STAM extracts coarse-grained local and global temporal information by fusing shift module with temporal adaptive modeling module. The SOI-Tr module utilizes important object attention to excite fine-grained global temporal representation reasoning. In addition, we create a multi-class violence (MCV) dataset of video clips extracted from real-world scenes to address the limitation of poorly diversified categories in most existing violence datasets. Finally, we also conduct extensive experiments on five violence datasets, including the MCV, and the results show that our network outperforms state-of-the-art performance.
ISSN:1945-788X
DOI:10.1109/ICME52920.2022.9859986