Loading…
Multi-Head-Self-Attention based YOLOv5X-transformer for multi-scale object detection
The state-of-the-art deep learning models mostly depend upon the region proposal and grid methods in detecting the objects with localization has been in practice for a long time but still, it has got scope for further improvement. There exists a visual challenge of small-scale object detection. To s...
Saved in:
Published in: | Multimedia tools and applications 2024-04, Vol.83 (12), p.36491-36517 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The state-of-the-art deep learning models mostly depend upon the region proposal and grid methods in detecting the objects with localization has been in practice for a long time but still, it has got scope for further improvement. There exists a visual challenge of small-scale object detection. To solve this issue, some methods are used for creating and enhancing shallow features and feature fusion concepts. However, the progressive enhancement of shallow features reduces the quality of the image, and feature fusion causes the aliasing effect. To ensure and solve this aliasing effect in small-scale object detection, this paper introduces the YOLOv5X- transformer. In this model, Multi-Head-Self-Attention (MHSA) module extracts in-depth information from the feature maps based on the query, key, and value parameters. Afterward, these feature maps are pooled together at five different scales by using Spatial Pyramid Pooling-Faster (SPPF) to improve the quality of the feature maps. To save spatial information and locate the pixels correctly Path-Aggregated Network (PANet) is used as a neck model. This model is experimentally verified on the PASCAL dataset. This model has achieved 87.7% mAP, 85.2% Precision, and 81.4% Recall. These tested results show that the proposed model performs better than existing models in detecting small objects. |
---|---|
ISSN: | 1573-7721 1380-7501 1573-7721 |
DOI: | 10.1007/s11042-023-15773-4 |