Loading…
Three-Stream Attention-Aware Network for RGB-D Salient Object Detection
Previous RGB-D fusion systems based on convolutional neural networks typically employ a two-stream architecture, in which RGB and depth inputs are learned independently. The multi-modal fusion stage is typically performed by concatenating the deep features from each stream in the inference process....
Saved in:
Published in: | IEEE transactions on image processing 2019-06, Vol.28 (6), p.2825-2835 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Previous RGB-D fusion systems based on convolutional neural networks typically employ a two-stream architecture, in which RGB and depth inputs are learned independently. The multi-modal fusion stage is typically performed by concatenating the deep features from each stream in the inference process. The traditional two-stream architecture might experience insufficient multi-modal fusion due to two following limitations: (1) the cross-modal complementarity is rarely studied in the bottom-up path, wherein we believe the cross-modal complements can be combined to learn new discriminative features to enlarge the RGB-D representation community and (2) the cross-modal channels are typically combined by undifferentiated concatenation, which appears ambiguous to selecting cross-modal complementary features. In this paper, we address these two limitations by proposing a novel three-stream attention-aware multi-modal fusion network. In the proposed architecture, a cross-modal distillation stream, accompanying the RGB-specific and depth-specific streams, is introduced to extract new RGB-D features in each level in the bottom-up path. Furthermore, the channel-wise attention mechanism is innovatively introduced to the cross-modal cross-level fusion problem to adaptively select complementary feature maps from each modality in each level. Extensive experiments report the effectiveness of the proposed architecture and the significant improvement over the state-of-the-art RGB-D salient object detection methods. |
---|---|
ISSN: | 1057-7149 1941-0042 |
DOI: | 10.1109/TIP.2019.2891104 |