Loading…

Global feature-based multimodal semantic segmentation

•We introduce a novel GFBN architecture that specializes in the efficient aggregation of representative cross-modal features, significantly enhancing multimodal semantic segmentation.•We propose the CARM to correct multimodal features by exchanging complementary information between them in spatial d...

Full description

Saved in:
Bibliographic Details
Published in:Pattern recognition 2024-07, Vol.151, p.110340, Article 110340
Main Authors: Gao, Suining, Yang, Xiubin, Jiang, Li, Fu, Zongqiang, Du, Jiamin
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•We introduce a novel GFBN architecture that specializes in the efficient aggregation of representative cross-modal features, significantly enhancing multimodal semantic segmentation.•We propose the CARM to correct multimodal features by exchanging complementary information between them in spatial dimensions. Additionally, we introducethe CFM to perform long-range feature fusion. In addition, the BGM is also presented, which adaptively generates boundary priors to enhance the feature boundary definition, thereby improving the segmentation performance.•Culminating in conclusive experiments on two multimodal datasets,our method demonstrates superior performance over other state-of-the-art methods. Ablation studies also highlight theimportance of each proposed component in achieving this result. Incorporating complementary modality into RGB branch can significantly improve the effectiveness of semantic segmentation. However, fusion between the two modalities faces huge challenge due to the difference of their optical dimensions. Existed fusion methods can't keep a balance between performance and efficiency in aggregating detailed features. To address this problem, we propose a global feature-based network (GFBN) for semantic segmentation that establishes mapping function and extraction relationship among the multi-modalities. The GFBN contains three important modules, which are used for feature correction, fusion and edge enhancement. Firstly, the cross-attention rectification module (CARM) adaptively extracts mapping relationships and rectifies the RGB and complementary features. Secondly, the cross-field fusion module (CFM) integrates long-range rectified features of two branches to obtain an optimal fusion feature. Finally, the boundary guidance module (BGM) sharpens the boundary information of the fused features to effectively improve the segmentation accuracy of object boundaries. We make the experiments of GFBN on the challenging MCubeS and ZJU-RGB-Ps datasets. The results show that GFBN outperforms state-of-the-art methods by at least 0.64 % and 0.7 % on mean intersection over union (mIoU), respectively. It demonstrates the performance and efficiency of our proposed method. The code corresponding to our method can be found at the following link: https://github.com/Sci-Epiphany/GFBNext.
ISSN:0031-3203
1873-5142
DOI:10.1016/j.patcog.2024.110340