Loading…

YOLOv7F: Enhanced YOLOv7 With Guided Feature Fusion

A general object detector has a high misdetection rate for small objects. Although many small-object detectors consider the insufficient representation of objects, their performance in detecting very tiny objects with a strong similarity to other objects and backgrounds in aerial images remains poor...

Full description

Saved in:
Bibliographic Details
Published in:IEEE access 2024, Vol.12, p.169487-169498
Main Authors: Kim, Haemoon, Park, Seonghyun, Kim, Hyunhak, Ahn, Jongsik, Lee, Tae-Young, Ha, Yunchul, Choi, Byungin
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:A general object detector has a high misdetection rate for small objects. Although many small-object detectors consider the insufficient representation of objects, their performance in detecting very tiny objects with a strong similarity to other objects and backgrounds in aerial images remains poor. In this study, we analyze the misalignments of spatial and semantic information of features due to resizing, involving interpolation and pooling operations conducted before multi-scale feature fusion. Additionally, as a learning target, the objectness loss uses IoU values, which are sensitive to the minute distance differences between predicted small objects in the detector and the ground-truth data. Therefore, the neck and head architecture of the proposed You Only Look Once version 7 Fusion (YOLOv7F) model is redesigned to be suitable for small-object detection. The YOLOv7F model includes the Deformable Feature Fusion (DFF) module, which aligns the features based on the guided features, and the Objectness Refinement Head (ORH) model, which refines the predicted objectness score. The YOLOv7F model achieved 63.9% mAP_{0.5} performance and led to a 4.1% improvement compared to the YOLOv7X model on the AI-TODv2, where small objects account for 98.1% of the all instances. In the VisDrone2019-DET dataset, where 32.0% of instances are larger than a medium-sized object, YOLOv7F model achieved an mAP_{0.5} of 63.9%, a 2.0% improvement compared to the YOLOv7X model.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2024.3486766