Loading…

MCPT: Mixed Convolutional Parallel Transformer for Polarimetric SAR Image Classification

Vision transformers (ViT) have the characteristics of massive training data and complex model, which cannot be directly applied to polarimetric synthetic aperture radar (PolSAR) image classification tasks. Therefore, a mixed convolutional parallel transformer (MCPT) model based on ViT is proposed fo...

Full description

Saved in:

Bibliographic Details
Published in:	Remote sensing (Basel, Switzerland) Switzerland), 2023-06, Vol.15 (11), p.2936
Main Authors:	Wang, Wenke, Wang, Jianlong, Lu, Bibo, Liu, Boyuan, Zhang, Yake, Wang, Chunyang
Format:	Article
Language:	English
Subjects:	Artificial satellites in remote sensing Classification Coders Complexity Computational linguistics convolutional neural network Decomposition Deep learning Embedding Feature extraction global average pooling Image classification Language processing Latency mixed depthwise convolution tokenization Natural language interfaces Network latency Neural networks Optimization parallel encoder polarimetric SAR Polarimetry Radar imaging Radarsat Remote sensing Synthetic aperture radar Teaching methods Training vision transformer
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Vision transformers (ViT) have the characteristics of massive training data and complex model, which cannot be directly applied to polarimetric synthetic aperture radar (PolSAR) image classification tasks. Therefore, a mixed convolutional parallel transformer (MCPT) model based on ViT is proposed for fast PolSAR image classification. First of all, a mixed depthwise convolution tokenization is introduced. It replaces the learnable linear projection in the original ViT to obtain patch embeddings. The process of tokenization can reduce computational and parameter complexity and extract features of different receptive fields as input to the encoder. Furthermore, combining the idea of shallow networks with lower latency and easier optimization, a parallel encoder is implemented by pairing the same modules and recombining to form parallel blocks, which can decrease the network depth and computing power requirement. In addition, the original class embedding and position embedding are removed during tokenization, and a global average pooling layer is added after the encoder for category feature extraction. Finally, the experimental results on AIRSAR Flevoland and RADARSAT-2 San Francisco datasets show that the proposed method achieves a significant improvement in training and prediction speed. Meanwhile, the overall accuracy achieved was 97.9% and 96.77%, respectively.
ISSN:	2072-4292 2072-4292
DOI:	10.3390/rs15112936