Loading…

Feature flow: In-network feature flow estimation for video object detection

•A type of shallow modules are proposed to directly predict the feature flow for feature alignment in a single network.•Self-supervision learning is introduced to further improve the quality of the predicted feature flow.•A new state-of-the-art performance is shown by comparing with other methods, w...

Full description

Saved in:
Bibliographic Details
Published in:Pattern recognition 2022-02, Vol.122, p.108323, Article 108323
Main Authors: Jin, Ruibing, Lin, Guosheng, Wen, Changyun, Wang, Jianliang, Liu, Fayao
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•A type of shallow modules are proposed to directly predict the feature flow for feature alignment in a single network.•Self-supervision learning is introduced to further improve the quality of the predicted feature flow.•A new state-of-the-art performance is shown by comparing with other methods, while a fast inference speed is maintained. Optical flow, which expresses pixel displacement, is widely used in many computer vision tasks to provide pixel-level motion information. However, with the remarkable progress of the convolutional neural network, recent state-of-the-art approaches are proposed to solve problems directly on feature-level. Since the displacement of feature vector is not consistent with the pixel displacement, a common approach is to forward optical flow to a neural network and fine-tune this network on the task dataset. With this method, they expect the fine-tuned network to produce tensors encoding feature-level motion information. In this paper, we rethink about this de facto paradigm and analyze its drawbacks in the video object detection task. To mitigate these issues, we propose a novel network (IFF-Net) with an In-network Feature Flow estimation module (IFF module) for video object detection. Without resorting to pre-training on any additional dataset, our IFF module is able to directly produce feature flow which indicates the feature displacement. Our IFF module consists of a shallow module, which shares the features with the detection branches. This compact design enables our IFF-Net to accurately detect objects, while maintaining a fast inference speed. Furthermore, we propose a transformation residual loss (TRL) based on self-supervision, which further improves the performance of our IFF-Net. Our IFF-Net outperforms existing methods and achieves new state-of-the-art performance on ImageNet VID.
ISSN:0031-3203
1873-5142
DOI:10.1016/j.patcog.2021.108323