Loading…

Transformer-based difference fusion network for RGB-D salient object detection

RGB-D salient object detection (SOD) can usually be divided into three stages: feature extraction, feature fusion, and feature prediction. Most approaches treat the feature information extracted by the backbone network identically in the final two stages of detection, neglecting the fact that variou...

Full description

Saved in:
Bibliographic Details
Published in:Journal of electronic imaging 2022-11, Vol.31 (6), p.063058-063058
Main Authors: Cui, Zhi-Qiang, Wang, Feng, Feng, Zheng-Yong
Format: Article
Language:English
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:RGB-D salient object detection (SOD) can usually be divided into three stages: feature extraction, feature fusion, and feature prediction. Most approaches treat the feature information extracted by the backbone network identically in the final two stages of detection, neglecting the fact that various modalities and different hierarchical features play distinct roles in SOD, resulting in poor detection results. To solve this problem, we propose a transformer-based difference fusion network (TDF-Net) for RGB-D SOD that treats modal features and hierarchical features differently in the feature fusion and feature prediction stages, respectively. First, we adopt the pyramid vision transformer as a feature extractor to obtain hierarchical features from the input RGB images and depth images, respectively. Second, we propose a differential interactive fusion module, in which the RGB modality and the depth modality learn modality-specific features independently, and the two modalities guide each other to fuse features. Finally, we divide the hierarchical features after cross-modal fusion into high-level and low-level features and propose three types of cross-layer fusion modules to discriminately integrate features from different layers to predict the salient maps. Extensive experiments on five benchmark datasets confirm that our proposed TDF-Net outperforms the state-of-the-art methods.
ISSN:1017-9909
1560-229X
DOI:10.1117/1.JEI.31.6.063058