Loading…

DHFNet: dual-decoding hierarchical fusion network for RGB-thermal semantic segmentation

Recently, red-green-blue (RGB) and thermal (RGB-T) data have attracted considerable interest for semantic segmentation because they provide robust imaging under the complex lighting conditions of urban roads. Most existing RGB-T semantic segmentation methods adopt an encoder-decoder structure, and r...

Full description

Saved in:

Bibliographic Details
Published in:	The Visual computer 2024, Vol.40 (1), p.169-179
Main Authors:	Cai, Yuqi, Zhou, Wujie, Zhang, Liting, Yu, Lu, Luo, Ting
Format:	Article
Language:	English
Subjects:	Artificial Intelligence Computer Graphics Computer Science Image Processing and Computer Vision Original Article
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Recently, red-green-blue (RGB) and thermal (RGB-T) data have attracted considerable interest for semantic segmentation because they provide robust imaging under the complex lighting conditions of urban roads. Most existing RGB-T semantic segmentation methods adopt an encoder-decoder structure, and repeated upsampling causes semantic information loss during decoding. Moreover, using simple cross-modality fusion neither completely mines complementary information from different modalities nor removes noise from the extracted features. To address these problems, we developed a dual-decoding hierarchical fusion network (DHFNet) to extract RGB and thermal information for RGB-T Semantic Segmentation. DHFNet uses a novel two-layer decoder and implements boundary refinement and boundary-guided foreground/background enhancement modules. The modules process features from different levels to achieve the global guidance and local refinement of the segmentation prediction. In addition, an adaptive attention-filtering fusion module filters and extracts complementary information from the RGB and thermal modalities. Further, we introduce a graph convolutional network and an atrous spatial pyramid pooling module to obtain multiscale features and deepen the extracted semantic information. Experimental results on two benchmark datasets showed that the proposed DHFNet performed well relative to state-of-the-art semantic segmentation methods in terms of different evaluation metrics.
ISSN:	0178-2789 1432-2315
DOI:	10.1007/s00371-023-02773-6