Loading…

Video object detection via space–time feature aggregation and result reuse

When detecting the objects in videos, motion always leads to object deterioration, like blurring and occlusion, as well as the strange state of the object's shape and posture. Consequently, the detection of video frames will lead to a decline in accuracy by using the image object detection mode...

Full description

Saved in:
Bibliographic Details
Published in:IET image processing 2024-10, Vol.18 (12), p.3356-3367
Main Authors: Duan, Liang, Yang, Rongfei, Yue, Kun, Sun, Zhengbao, Yuan, Guowu
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:When detecting the objects in videos, motion always leads to object deterioration, like blurring and occlusion, as well as the strange state of the object's shape and posture. Consequently, the detection of video frames will lead to a decline in accuracy by using the image object detection model. This paper proposes an online video object detection method based on the one‐stage detector YOLOx. First, the module for space–time feature aggregation is given, which uses the space–time information of past frames to enhance the feature quality of the current frame. Then, the module for result reuse is given, which incorporates the detection results of past frames to improve the detection stability of the current frame. By these two modules, the trade‐off between accuracy and speed of video object detection could be achieved. Experimental results on the ImageNet VID show the improvement of speed and accuracy of the proposed method. A space–time feature aggregation (STFA) module to retrieve highly relevant space–time features from memory frame features and aggregate them to the current frame. A module is proposed to reuse the stable detection results (result reuse) in the past frame to the current frame, which makes the detection results more stable. An online video object detection method is proposed by combining the space–time feature aggregation and result reuse modules. The detection accuracy is improved by using the space–time information and detection results of past frames in the video sequence to achieve the trade‐off between accuracy and speed.
ISSN:1751-9659
1751-9667
DOI:10.1049/ipr2.13179