Loading…

Swin-Transformer-Based YOLOv5 for Small-Object Detection in Remote Sensing Images

This study aimed to address the problems of low detection accuracy and inaccurate positioning of small-object detection in remote sensing images. An improved architecture based on the Swin Transformer and YOLOv5 is proposed. First, Complete-IOU ( ) was introduced to improve the K-means clustering al...

Full description

Saved in:

Bibliographic Details
Published in:	Sensors (Basel, Switzerland) Switzerland), 2023-03, Vol.23 (7), p.3634
Main Authors:	Cao, Xuan, Zhang, Yanwei, Lang, Song, Gong, Yan
Format:	Article
Language:	English
Subjects:	Accuracy Algorithms attention mechanism Boxes Clustering Computational linguistics Datasets Language processing multi-scale feature fusion Natural language interfaces Remote sensing Sensors small-object detection Swin Transformer YOLOv5
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This study aimed to address the problems of low detection accuracy and inaccurate positioning of small-object detection in remote sensing images. An improved architecture based on the Swin Transformer and YOLOv5 is proposed. First, Complete-IOU ( ) was introduced to improve the K-means clustering algorithm, and then an anchor of appropriate size for the dataset was generated. Second, a modified CSPDarknet53 structure combined with Swin Transformer was proposed to retain sufficient global context information and extract more differentiated features through multi-head self-attention. Regarding the path-aggregation neck, a simple and efficient weighted bidirectional feature pyramid network was proposed for effective cross-scale feature fusion. In addition, extra prediction head and new feature fusion layers were added for small objects. Finally, Coordinate Attention (CA) was introduced to the YOLOv5 network to improve the accuracy of small-object features in remote sensing images. Moreover, the effectiveness of the proposed method was demonstrated by several kinds of experiments on the DOTA (Dataset for Object detection in Aerial images). The mean average precision on the DOTA dataset reached 74.7%. Compared with YOLOv5, the proposed method improved the mean average precision ( ) by 8.9%, which can achieve a higher accuracy of small-object detection in remote sensing images.
ISSN:	1424-8220 1424-8220
DOI:	10.3390/s23073634