Loading…

CCTseg: A cascade composite transformer semantic segmentation network for UAV visual perception

•A novel cascade composite transformer semantic segmentation network named CCTseg is proposed for UAV visual perception.•The cascade composite structured encoder which consists of three transformer-based feature extraction backbones and cascade fusion multi-stage features is designed to better extra...

Full description

Saved in:
Bibliographic Details
Published in:Measurement : journal of the International Measurement Confederation 2023-04, Vol.211, p.112612, Article 112612
Main Authors: Yi, Shi, Li, Junjie, Jiang, Gang, Liu, Xi, Chen, Ling
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•A novel cascade composite transformer semantic segmentation network named CCTseg is proposed for UAV visual perception.•The cascade composite structured encoder which consists of three transformer-based feature extraction backbones and cascade fusion multi-stage features is designed to better extract features of UAV images.•The transformer block with additional spatial enhanced head is implemented as the basic feature extraction block to make the extracted features with global context information of surroundings and local information of objects.•A symmetric rhombus decoder is proposed to make fully utilization of middle-level features which contain abundance of useful information of UAV images. Semantic segmentation could obtain the pixel level classification of surrounding environments which is an essential task for autonomous vehicles and mobile robots visual perception. Most existing semantic segmentation networks were focused on the visual perception of autonomous vehicles. Little attention is paid to the semantic segmentation for UAV (Unmanned Aerial Vehicle) visual perception, which is crucial to UAV autonomous flight and landing spot searching. Compared with views from autonomous vehicles, the UAV-based views were more challenging for the semantic segmentation task due to images captured by UAV containing large-scale variation of objects size caused by different altitude and angle. The existing semantic segmentation networks for the visual perception of autonomous vehicles are generally inadequate to effectively extract the representative features of UAV images which required contain context information and local information simultaneously. A cascade composite transformer-based semantic segmentation network is proposed in this study for UAV visual perception. A cascade composite encoder is designed which consists of three transformer-based feature extraction backbones and cascade fused low-level features, middle-level features and high-level features to achieve better feature representation capacity. The spatial enhanced transformer block is implemented as the basic feature extraction block of each backbone to make the extracted features contain context information of environments and local information of objects. A symmetric rhombus decoder is proposed to integrate multi-stage features and make fully utilise of middle stage features which contained abundance of useful information, thus accurately pixel level prediction could be obtained in this wa
ISSN:0263-2241
1873-412X
DOI:10.1016/j.measurement.2023.112612