Loading…

ICSF: Integrating Inter-Modal and Cross-Modal Learning Framework for Self-Supervised Heterogeneous Change Detection

Heterogeneous change detection (HCD) is a process to determine the change information by analyzing heterogeneous images of the same geographic location taken at different times, which plays an important role in remote sensing applications such as disaster response and environmental monitoring. Howev...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on geoscience and remote sensing 2025, Vol.63, p.1-16
Main Authors: Zhang, Erlei, Zong, He, Li, Xinyu, Feng, Mingchen, Ren, Jinchang
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Heterogeneous change detection (HCD) is a process to determine the change information by analyzing heterogeneous images of the same geographic location taken at different times, which plays an important role in remote sensing applications such as disaster response and environmental monitoring. However, the different imaging mechanisms result in different visual appearances in heterogeneous images, making it difficult to accurately detect changes through direct comparison. To address this problem, we propose a inter-modal and cross-modal self-supervised dual branch learning framework (ICSF) for HCD that incorporates inter-modal and cross-modal learning. First, in the inter-modal branch, we perform contrastive learning on heterogeneous images within their respective modalities to learn the robust and discriminative features, rather than relying on the raw spectral or spatial information from these images. Second, in the cross-modal branch, we perform cross-modal reconstruction to ensure the obtained features exhibit consistent comparability, thereby facilitating the extraction of rich information on the real changes within the images. Next, the difference images (DIs) computed from both branches are further refined using a superpixel segmentation strategy to preserve the consistency of differences within the same ground object. Experimental results on five public datasets with different modality combinations and change events demonstrate the effectiveness of the proposed approach in comparison to ten state-of-the-art (SOTA) methods, achieving the best performance with an average overall accuracy (OA) of 95.88% and an average Kappa coefficient (KC) of 74.20%.
ISSN:0196-2892
1558-0644
DOI:10.1109/TGRS.2024.3519195