Loading…

SCTF: an efficient neural network based on local spatial compression and full temporal fusion for video violence detection

Spatiotemporal modeling is key for action recognition in videos. In this paper, we propose a Spatial features Compression and Temporal features Fusion (SCTF) block, including a Local Spatial features Compression (LSC) module and a Full Temporal features Fusion (FTF) module, we call the network equip...

Full description

Saved in:

Bibliographic Details
Published in:	Multimedia tools and applications 2024-04, Vol.83 (12), p.36899-36919
Main Authors:	Zhenhua, Tan, Zhenche, Xia, Pengfei, Wang, Danke, Wu, li, Li
Format:	Article
Language:	English
Subjects:	1230: Sentient Multimedia Systems and Visual Intelligence Accuracy Computer Communication Networks Computer Science Data Structures and Information Theory Frames (data processing) Human activity recognition Modelling Modules Multimedia Multimedia Information Systems Neural networks Nonviolence Spatial data Special Purpose and Application-Based Systems Video compression Violence
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Spatiotemporal modeling is key for action recognition in videos. In this paper, we propose a Spatial features Compression and Temporal features Fusion (SCTF) block, including a Local Spatial features Compression (LSC) module and a Full Temporal features Fusion (FTF) module, we call the network equipped with SCTF block SCTF-NET, which is a human action recognition network more suitable for violent video detection. The spatial extraction and temporal fusions in previous works are typically achieved by stacking large numbers of convolution layers or adding some complex recurrent neural layers. In contrast, the SCTF module extracts the spatial information of video frames by LSC module, and the temporal sequence information of continuous frames is fused by FTF module, which can effectively conduct spatiotemporal modeling. Finally, our approach achieves good performance on action recognition benchmarks such as HMDB51 and UCF101. Meanwhile, it is more efficient in training and detection. What’s more, experiments on violence datasets Hockey Fights, Movie Fight and Violent Flow show that, our proposed SCTF block is more suitable for violent action recognition. Our code is available at https://github.com/TAN-OpenLab/SCTF-Net .
ISSN:	1573-7721 1380-7501 1573-7721
DOI:	10.1007/s11042-023-16269-x