Loading…

AFRNet: adaptive feature refinement network

In the domain of computer vision, object detection is a fundamental task, aimed at accurately identifying and localizing objects of various sizes within images. While existing models such as You Only Look Once, Adaptive Training Sample Selection, and Task-aligned One-stage Object Detection have made...

Full description

Saved in:

Bibliographic Details
Published in:	Signal, image and video processing image and video processing, 2024-11, Vol.18 (11), p.7779-7788
Main Authors:	Zhang, Jilong, Yang, Yanjiao, Liu, Jienan, Jiang, Jing, Ma, Mei
Format:	Article
Language:	English
Subjects:	Adaptive sampling Computer Imaging Computer Science Computer vision Data integration Deformation effects Effectiveness Formability Geometric transformation Image enhancement Image Processing and Computer Vision Multimedia Information Systems Object recognition Original Paper Pattern Recognition and Graphics Signal,Image and Speech Processing Target detection Vision
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In the domain of computer vision, object detection is a fundamental task, aimed at accurately identifying and localizing objects of various sizes within images. While existing models such as You Only Look Once, Adaptive Training Sample Selection, and Task-aligned One-stage Object Detection have made breakthroughs in this field, they still exhibit deficiencies in information fusion within their neck structure. To overcome these limitations, we have designed an innovative model architecture known as Adaptive Feature Refinement Network (AFRNet). The model, on one hand, discards the conventional Feature Pyramid Network structure and designs a novel neck structure that incorporates the structures of Scale Sequence Feature Fusion (SSFF) model and the Gather-and-Distribute (GD) mechanism. Through experimentation, it has been demonstrated that the SSFF method can further enhance the multi-scale feature fusion of the GD mechanism, thereby improving the performance of the target detection task. On the other hand, to address the constraints of existing models in simulating geometric transformations, We have designed an advanced variable convolution structure called Attentive Deformable ConvNet. This structure integrates an improved attention mechanism, which allows for more precise capture of key features in images. Extensive experiments conducted on the MS-COCO dataset have validated the effectiveness of our model. In single-model, single-scale testing, our model achieved an Average Precision (AP) of 51.8%, a result that underscores a significant enhancement in object detection performance and confirms the efficacy of our model.
ISSN:	1863-1703 1863-1711
DOI:	10.1007/s11760-024-03427-3