Loading…

Transformer guidance dual-stream network for salient object detection in optical remote sensing images

Salient object detection (SOD) has achieved remarkable performance in natural scene images (NSIs). However, current SOD methods still face serious challenges in processing optical remote sensing images (RSIs) due to cluttered backgrounds, diverse scales, and different views, which are distinguished...

Full description

Saved in:

Bibliographic Details
Published in:	Neural computing & applications 2023-08, Vol.35 (24), p.17733-17747
Main Authors:	Zhang, Yi, Guo, Jichang, Yue, Huihui, Yin, Xiangjun, Zheng, Sida
Format:	Article
Language:	English
Subjects:	Artificial Intelligence Computational Biology/Bioinformatics Computational Science and Engineering Computer Science Data Mining and Knowledge Discovery Datasets Hierarchies Image Processing and Computer Vision Modules Object recognition Original Article Probability and Statistics in Computer Science Remote sensing Salience Transformers
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Salient object detection (SOD) has achieved remarkable performance in natural scene images (NSIs). However, current SOD methods still face serious challenges in processing optical remote sensing images (RSIs) due to cluttered backgrounds, diverse scales, and different views, which are distinguished from NSIs. In this paper, a transformer guidance dual-stream network (TGDNet) is proposed for SOD in optical RSIs. The key insight is to extract multi-scale features by global receptive fields and separately refine them according to the characteristics of feature hierarchies. Specifically, inspired by the long-range dependencies of transformer, a transformer guidance dual-stream strategy is proposed to compensate the extracted details such as boundaries and edges using global information. To overcome the issue of diverse scales of salient objects in optical RSIs, a sequence inheritance channel attention module is built to focus more on high-level semantic features at different scales. In addition, a pyramid spatial attention module is elaborately designed to refine low-level features as well as to suppress background interference for accurate SOD in optical RSIs. At last, a coarse-to-fine decoder is utilized to progressively predict salient objects. In the experiment, the EORSSD dataset is employed to train and evaluate the proposed TGDNet. It achieves performance of 0.0049, 0.8964, and 0.9286 in terms of MAE, F-measure, and S-measure, respectively. Furthermore, ORSSD dataset is also utilized to evaluate the generality. Experimental results demonstrate the advantages of TGDNet over the state-of-the-art SOD methods.
ISSN:	0941-0643 1433-3058
DOI:	10.1007/s00521-023-08640-8