Loading…

AF-DETR: efficient UAV small object detector via Assemble-and-Fusion mechanism

With the rise of deep learning networks, object detection technologies for unmanned aerial vehicle (UAV) have demonstrated outstanding performance in many application scenarios. However, current small object detection approaches overwhelmingly disregard sparse feature interactions and global context...

Full description

Saved in:
Bibliographic Details
Published in:Pattern analysis and applications : PAA 2024-12, Vol.27 (4), Article 135
Main Authors: Ren, Lingfei, Lei, Huan, Li, Zhongxu, Yang, Wenyuan
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:With the rise of deep learning networks, object detection technologies for unmanned aerial vehicle (UAV) have demonstrated outstanding performance in many application scenarios. However, current small object detection approaches overwhelmingly disregard sparse feature interactions and global context modeling, resulting in incomplete utilization and even loss of semantic information of small objects. Therefore, this study provides an advanced Assemble-and-Fusion mechanism used in DEtection TRansformer (AF-DETR), in which the aggregated global semantics are allocated across layers to augment fine-grained feature learning for small instances. Meanwhile, an adaptive context broadcasting module is designed to effectively integrate contextual information in the decoder, thus ensuring accurate detection of small objects. First, the last four stage features selected from the backbone are sent into the intra-scale feature interaction module, which performs self-attention operation on feature map of the last scale. Second, a fixed fusion module aligns and aggregates multi-scale representations prior to dissemination across layers. Features of adjoining levels then undergo transformation and consolidation within convolutional module. Finally, an enhanced adaptive context broadcasting module is introduced within the decoding MLP to incorporate aggregated semantics into individual tokens for broadcasting contextual information. Our AF-DETR achieves 49.5 % mAP50 and 29.5 % mAP50-95 on VisDrone2021 dataset, and impressive mAP50 results of 67.7% and 70.7% are achieved under RGB and Infrared modalities on the DroneVehicle dataset respectively. Extensive evaluations manifest consistent performance gains attained by our approach over state-of-the-art methods under various metrics, validated across multiple UAV perception benchmarks containing small objects under practical complex conditions.
ISSN:1433-7541
1433-755X
DOI:10.1007/s10044-024-01349-x