Loading…

Gradient Decoupled Learning With Unimodal Regularization for Multimodal Remote Sensing Classification

The joint use of multisource remote-sensing data for Earth observation has drawn much attention due to its robust performance. Although many methods have been proposed to fuse multimodal data, they tend to improve the interaction of different modality data while ignoring the optimization of each mod...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on geoscience and remote sensing 2024, Vol.62, p.1-12
Main Authors:	Wei, Shicai, Luo, Chunbo, Ma, Xiaoguang, Luo, Yang
Format:	Article
Language:	English
Subjects:	Classification Convolutional neural networks decoupling learning deep learning Feature extraction Fuses Image classification Laser radar multimodal Optimization Probabilistic logic Remote sensing remote sensing (RS) Training Transformers
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The joint use of multisource remote-sensing data for Earth observation has drawn much attention due to its robust performance. Although many methods have been proposed to fuse multimodal data, they tend to improve the interaction of different modality data while ignoring the optimization of each modality. Existing studies show that high-performance modalities will suppress the learning of weak ones, leading to under-optimized multimodal learning. To this end, we propose a general framework called gradient decoupled network (GDNet) to assist the multimodal remote sensing (RS) classification. GDNet guides each modality encoder in the multimodal model to learn probabilistic representations instead of deterministic ones. This helps decouple their gradient, reducing their influence on each other and encouraging them to learn the modality-specific information. Then, we further introduce the unimodal regularization for each modality encoder to align their logit output with the multimodal one and label distribution simultaneously. This helps introduce independent gradient paths for each morality encoder to accelerate their optimization when preserving the modality-share information. Finally, extensive experiments conducted on three benchmark datasets demonstrate that the proposed GDNet can effectively address the under-optimized problem in multimodal RS image classification. Code is available at https://github.com/shicaiwei123/TGRS-GDNet .
ISSN:	0196-2892 1558-0644
DOI:	10.1109/TGRS.2024.3478393