Loading…

MLKNet: Multi-Stage for Remote Sensing Image Spatiotemporal Fusion Network Based on a Large Kernel Attention

Currently, within the realm of deep learning-based spatiotemporal fusion algorithms, those that employ solely convolutional operations are unable to efficiently extract the global image information. In addition, fusion networks that employ a combination of convolution and transformer neglect the 2-D...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE journal of selected topics in applied earth observations and remote sensing 2024, Vol.17, p.1257-1268
Main Authors:	Jiang, Hao, Qian, Yurong, Yang, Guangqi, Liu, Hui
Format:	Article
Language:	English
Subjects:	Algorithms Attention mechanism Computer applications Computer vision Computing costs Convolution Deep learning Feature extraction Gaofen-1 moderate-resolution imaging spectroradiometer (GF1-MODIS) Image fusion Image resolution Inclusions Information processing Kernels Machine learning Modules multi-scale Remote sensing remote sensing images Satellite imagery Satellites Sensors Spatial resolution spatio-temporal fusion Spatiotemporal phenomena Spectroradiometers Temporal variations Transformers
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Currently, within the realm of deep learning-based spatiotemporal fusion algorithms, those that employ solely convolutional operations are unable to efficiently extract the global image information. In addition, fusion networks that employ a combination of convolution and transformer neglect the 2-D structure of remote sensing images and the role of their channels during training, resulting in an increased computational cost. The current complex fusion methods introduce noise and disregard the correlation between low fractional rate image's time-varying features and high-resolution image's spatial features. To address these issues, we propose TFNet-a temporal feature extraction network that combines normal and deep convolutions to better extract temporal features while reducing computational costs. Second, we suggest utilizing a convolution-based attention module with a large kernel to replace the transformer (LAM), which facilitates adjustment in both spatial and channel dimensions while preserving the image structure. Furthermore, for improved image fusion, we recommend a two-stage fusion module to merge feature images of various scales. This module for fusion integrates features of varying scales and resolutions from various perspectives, thereby preventing noise inclusions and producing favourable fusion outcomes. In addition, we advocate for the utilization of spatiotemporal fusion techniques on other satellites by introducing a new dataset, SW, which is founded on satellite images from Gaofen-1 and moderate-resolution imaging spectroradiometer.
ISSN:	1939-1404 2151-1535
DOI:	10.1109/JSTARS.2023.3338978