Loading…
Self-Sufficient Feature Enhancing Networks for Video Salient Object Detection
Detecting salient objects in videos is a very challenging task. Current state-of-the-art methods are dominated by motion based deep neural networks, among which optical flow is often leveraged as motion representation. Though with robust performance, these optical flow-based video salient object det...
Saved in:
Published in: | IEEE transactions on multimedia 2023, Vol.25, p.557-571 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Detecting salient objects in videos is a very challenging task. Current state-of-the-art methods are dominated by motion based deep neural networks, among which optical flow is often leveraged as motion representation. Though with robust performance, these optical flow-based video salient object detection methods face at least two problems that may hinder their generalization and application. First, computing optical flow as a pre-processing step does not support direct end-to-end learning; second, little attention has been given to the quality of visual features due to high computational cost of spatiotemporal feature encoding. In this paper we propose a novel self-sufficient feature enhancing network (SFENet) for video salient object detection, which leverages optical flow estimation as an auxiliary task while being end-to-end trainable. With a joint training scheme of both salient object detection and optical flow estimation, its multi-task architecture can be totally self-sufficient for achieving good performance without any pre-processing. Furthermore, for improving feature quality, we design four lightweight modules in spatial and temporal domains, including cross-layer fusion, multi-level warping, spatial-channel attention and boundary-aware refinement. The proposed method is evaluated through extensive experiments on five video salient object detection datasets. Experimental results show that our SFENet can be easily trained with fast convergence speed. It significantly outperforms previous methods in terms of various evaluation metrics. Moreover, with optical flow estimation and unsupervised video object segmentation as example applications, our method also yields state-of-the-art results on standard datasets. |
---|---|
ISSN: | 1520-9210 1941-0077 |
DOI: | 10.1109/TMM.2021.3129052 |