Loading…

Temporal Feature Fusion for 3D Detection in Monocular Video

Previous monocular 3D detection works focus on the single frame input in both training and inference. In real-world applications, temporal and motion information naturally exists in monocular video. It is valuable for 3D detection but under-explored in monocular works. In this paper, we propose a st...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on image processing 2024-01, Vol.33, p.2665-2675
Main Authors: Cheng, Haoran, Peng, Liang, Yang, Zheng, Lin, Binbin, He, Xiaofei, Wu, Boxi
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Previous monocular 3D detection works focus on the single frame input in both training and inference. In real-world applications, temporal and motion information naturally exists in monocular video. It is valuable for 3D detection but under-explored in monocular works. In this paper, we propose a straightforward and effective method for temporal feature fusion, which exhibits low computation cost and excellent transferability, making it conveniently applicable to various monocular models. Specifically, with the help of optical flow, we transform the backbone features produced by prior frames and fuse them into the current frame. We introduce the scene feature propagating mechanism, which accumulates history scene features without extra time-consuming. In this process, occluded areas are removed via forward-backward scene consistency. Our method naturally introduces valuable temporal features, facilitating 3D reasoning in monocular 3D detection. Furthermore, accumulated history scene features via scene propagating mitigate heavy computation overheads for video processing. Experiments are conducted on variant baselines, which demonstrate that the proposed method is model-agonistic and can bring significant improvement to multiple types of single-frame methods.
ISSN:1057-7149
1941-0042
DOI:10.1109/TIP.2024.3378475