Loading…

AFRNet: adaptive feature refinement network

In the domain of computer vision, object detection is a fundamental task, aimed at accurately identifying and localizing objects of various sizes within images. While existing models such as You Only Look Once, Adaptive Training Sample Selection, and Task-aligned One-stage Object Detection have made...

Full description

Saved in:
Bibliographic Details
Published in:Signal, image and video processing image and video processing, 2024-11, Vol.18 (11), p.7779-7788
Main Authors: Zhang, Jilong, Yang, Yanjiao, Liu, Jienan, Jiang, Jing, Ma, Mei
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c200t-a7b90fcabfa3730f8c22d1771cab3b72d9454deb375338a45a1a19283fa74c633
container_end_page 7788
container_issue 11
container_start_page 7779
container_title Signal, image and video processing
container_volume 18
creator Zhang, Jilong
Yang, Yanjiao
Liu, Jienan
Jiang, Jing
Ma, Mei
description In the domain of computer vision, object detection is a fundamental task, aimed at accurately identifying and localizing objects of various sizes within images. While existing models such as You Only Look Once, Adaptive Training Sample Selection, and Task-aligned One-stage Object Detection have made breakthroughs in this field, they still exhibit deficiencies in information fusion within their neck structure. To overcome these limitations, we have designed an innovative model architecture known as Adaptive Feature Refinement Network (AFRNet). The model, on one hand, discards the conventional Feature Pyramid Network structure and designs a novel neck structure that incorporates the structures of Scale Sequence Feature Fusion (SSFF) model and the Gather-and-Distribute (GD) mechanism. Through experimentation, it has been demonstrated that the SSFF method can further enhance the multi-scale feature fusion of the GD mechanism, thereby improving the performance of the target detection task. On the other hand, to address the constraints of existing models in simulating geometric transformations, We have designed an advanced variable convolution structure called Attentive Deformable ConvNet. This structure integrates an improved attention mechanism, which allows for more precise capture of key features in images. Extensive experiments conducted on the MS-COCO dataset have validated the effectiveness of our model. In single-model, single-scale testing, our model achieved an Average Precision (AP) of 51.8%, a result that underscores a significant enhancement in object detection performance and confirms the efficacy of our model.
doi_str_mv 10.1007/s11760-024-03427-3
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3104475851</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3104475851</sourcerecordid><originalsourceid>FETCH-LOGICAL-c200t-a7b90fcabfa3730f8c22d1771cab3b72d9454deb375338a45a1a19283fa74c633</originalsourceid><addsrcrecordid>eNp9kE9LAzEQxYMoWGq_gKcFjxKdyWQ3qbdS_AdFQfQcsruJtNrdmqSK397UFb05lxmG994MP8aOEc4QQJ1HRFUBByE5kBSK0x4boa6Io0Lc_52BDtkkxhXkIqF0pUfsdHb1cOfSRWFbu0nLd1d4Z9M2uCI4v-zc2nWp6Fz66MPLETvw9jW6yU8fs6ery8f5DV_cX9_OZwveCIDEraqn4Btbe0uKwOtGiBaVwryiWol2KkvZuppUSaStLC1anApN3irZVERjdjLkbkL_tnUxmVW_DV0-aQhBSlXqErNKDKom9DHmb80mLNc2fBoEs-NiBi4mczHfXMwumgZTzOLu2YW_6H9cXzUwY5I</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3104475851</pqid></control><display><type>article</type><title>AFRNet: adaptive feature refinement network</title><source>Springer Nature</source><creator>Zhang, Jilong ; Yang, Yanjiao ; Liu, Jienan ; Jiang, Jing ; Ma, Mei</creator><creatorcontrib>Zhang, Jilong ; Yang, Yanjiao ; Liu, Jienan ; Jiang, Jing ; Ma, Mei</creatorcontrib><description>In the domain of computer vision, object detection is a fundamental task, aimed at accurately identifying and localizing objects of various sizes within images. While existing models such as You Only Look Once, Adaptive Training Sample Selection, and Task-aligned One-stage Object Detection have made breakthroughs in this field, they still exhibit deficiencies in information fusion within their neck structure. To overcome these limitations, we have designed an innovative model architecture known as Adaptive Feature Refinement Network (AFRNet). The model, on one hand, discards the conventional Feature Pyramid Network structure and designs a novel neck structure that incorporates the structures of Scale Sequence Feature Fusion (SSFF) model and the Gather-and-Distribute (GD) mechanism. Through experimentation, it has been demonstrated that the SSFF method can further enhance the multi-scale feature fusion of the GD mechanism, thereby improving the performance of the target detection task. On the other hand, to address the constraints of existing models in simulating geometric transformations, We have designed an advanced variable convolution structure called Attentive Deformable ConvNet. This structure integrates an improved attention mechanism, which allows for more precise capture of key features in images. Extensive experiments conducted on the MS-COCO dataset have validated the effectiveness of our model. In single-model, single-scale testing, our model achieved an Average Precision (AP) of 51.8%, a result that underscores a significant enhancement in object detection performance and confirms the efficacy of our model.</description><identifier>ISSN: 1863-1703</identifier><identifier>EISSN: 1863-1711</identifier><identifier>DOI: 10.1007/s11760-024-03427-3</identifier><language>eng</language><publisher>London: Springer London</publisher><subject>Adaptive sampling ; Computer Imaging ; Computer Science ; Computer vision ; Data integration ; Deformation effects ; Effectiveness ; Formability ; Geometric transformation ; Image enhancement ; Image Processing and Computer Vision ; Multimedia Information Systems ; Object recognition ; Original Paper ; Pattern Recognition and Graphics ; Signal,Image and Speech Processing ; Target detection ; Vision</subject><ispartof>Signal, image and video processing, 2024-11, Vol.18 (11), p.7779-7788</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c200t-a7b90fcabfa3730f8c22d1771cab3b72d9454deb375338a45a1a19283fa74c633</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Zhang, Jilong</creatorcontrib><creatorcontrib>Yang, Yanjiao</creatorcontrib><creatorcontrib>Liu, Jienan</creatorcontrib><creatorcontrib>Jiang, Jing</creatorcontrib><creatorcontrib>Ma, Mei</creatorcontrib><title>AFRNet: adaptive feature refinement network</title><title>Signal, image and video processing</title><addtitle>SIViP</addtitle><description>In the domain of computer vision, object detection is a fundamental task, aimed at accurately identifying and localizing objects of various sizes within images. While existing models such as You Only Look Once, Adaptive Training Sample Selection, and Task-aligned One-stage Object Detection have made breakthroughs in this field, they still exhibit deficiencies in information fusion within their neck structure. To overcome these limitations, we have designed an innovative model architecture known as Adaptive Feature Refinement Network (AFRNet). The model, on one hand, discards the conventional Feature Pyramid Network structure and designs a novel neck structure that incorporates the structures of Scale Sequence Feature Fusion (SSFF) model and the Gather-and-Distribute (GD) mechanism. Through experimentation, it has been demonstrated that the SSFF method can further enhance the multi-scale feature fusion of the GD mechanism, thereby improving the performance of the target detection task. On the other hand, to address the constraints of existing models in simulating geometric transformations, We have designed an advanced variable convolution structure called Attentive Deformable ConvNet. This structure integrates an improved attention mechanism, which allows for more precise capture of key features in images. Extensive experiments conducted on the MS-COCO dataset have validated the effectiveness of our model. In single-model, single-scale testing, our model achieved an Average Precision (AP) of 51.8%, a result that underscores a significant enhancement in object detection performance and confirms the efficacy of our model.</description><subject>Adaptive sampling</subject><subject>Computer Imaging</subject><subject>Computer Science</subject><subject>Computer vision</subject><subject>Data integration</subject><subject>Deformation effects</subject><subject>Effectiveness</subject><subject>Formability</subject><subject>Geometric transformation</subject><subject>Image enhancement</subject><subject>Image Processing and Computer Vision</subject><subject>Multimedia Information Systems</subject><subject>Object recognition</subject><subject>Original Paper</subject><subject>Pattern Recognition and Graphics</subject><subject>Signal,Image and Speech Processing</subject><subject>Target detection</subject><subject>Vision</subject><issn>1863-1703</issn><issn>1863-1711</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kE9LAzEQxYMoWGq_gKcFjxKdyWQ3qbdS_AdFQfQcsruJtNrdmqSK397UFb05lxmG994MP8aOEc4QQJ1HRFUBByE5kBSK0x4boa6Io0Lc_52BDtkkxhXkIqF0pUfsdHb1cOfSRWFbu0nLd1d4Z9M2uCI4v-zc2nWp6Fz66MPLETvw9jW6yU8fs6ery8f5DV_cX9_OZwveCIDEraqn4Btbe0uKwOtGiBaVwryiWol2KkvZuppUSaStLC1anApN3irZVERjdjLkbkL_tnUxmVW_DV0-aQhBSlXqErNKDKom9DHmb80mLNc2fBoEs-NiBi4mczHfXMwumgZTzOLu2YW_6H9cXzUwY5I</recordid><startdate>20241101</startdate><enddate>20241101</enddate><creator>Zhang, Jilong</creator><creator>Yang, Yanjiao</creator><creator>Liu, Jienan</creator><creator>Jiang, Jing</creator><creator>Ma, Mei</creator><general>Springer London</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20241101</creationdate><title>AFRNet: adaptive feature refinement network</title><author>Zhang, Jilong ; Yang, Yanjiao ; Liu, Jienan ; Jiang, Jing ; Ma, Mei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c200t-a7b90fcabfa3730f8c22d1771cab3b72d9454deb375338a45a1a19283fa74c633</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Adaptive sampling</topic><topic>Computer Imaging</topic><topic>Computer Science</topic><topic>Computer vision</topic><topic>Data integration</topic><topic>Deformation effects</topic><topic>Effectiveness</topic><topic>Formability</topic><topic>Geometric transformation</topic><topic>Image enhancement</topic><topic>Image Processing and Computer Vision</topic><topic>Multimedia Information Systems</topic><topic>Object recognition</topic><topic>Original Paper</topic><topic>Pattern Recognition and Graphics</topic><topic>Signal,Image and Speech Processing</topic><topic>Target detection</topic><topic>Vision</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Jilong</creatorcontrib><creatorcontrib>Yang, Yanjiao</creatorcontrib><creatorcontrib>Liu, Jienan</creatorcontrib><creatorcontrib>Jiang, Jing</creatorcontrib><creatorcontrib>Ma, Mei</creatorcontrib><collection>CrossRef</collection><jtitle>Signal, image and video processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhang, Jilong</au><au>Yang, Yanjiao</au><au>Liu, Jienan</au><au>Jiang, Jing</au><au>Ma, Mei</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>AFRNet: adaptive feature refinement network</atitle><jtitle>Signal, image and video processing</jtitle><stitle>SIViP</stitle><date>2024-11-01</date><risdate>2024</risdate><volume>18</volume><issue>11</issue><spage>7779</spage><epage>7788</epage><pages>7779-7788</pages><issn>1863-1703</issn><eissn>1863-1711</eissn><abstract>In the domain of computer vision, object detection is a fundamental task, aimed at accurately identifying and localizing objects of various sizes within images. While existing models such as You Only Look Once, Adaptive Training Sample Selection, and Task-aligned One-stage Object Detection have made breakthroughs in this field, they still exhibit deficiencies in information fusion within their neck structure. To overcome these limitations, we have designed an innovative model architecture known as Adaptive Feature Refinement Network (AFRNet). The model, on one hand, discards the conventional Feature Pyramid Network structure and designs a novel neck structure that incorporates the structures of Scale Sequence Feature Fusion (SSFF) model and the Gather-and-Distribute (GD) mechanism. Through experimentation, it has been demonstrated that the SSFF method can further enhance the multi-scale feature fusion of the GD mechanism, thereby improving the performance of the target detection task. On the other hand, to address the constraints of existing models in simulating geometric transformations, We have designed an advanced variable convolution structure called Attentive Deformable ConvNet. This structure integrates an improved attention mechanism, which allows for more precise capture of key features in images. Extensive experiments conducted on the MS-COCO dataset have validated the effectiveness of our model. In single-model, single-scale testing, our model achieved an Average Precision (AP) of 51.8%, a result that underscores a significant enhancement in object detection performance and confirms the efficacy of our model.</abstract><cop>London</cop><pub>Springer London</pub><doi>10.1007/s11760-024-03427-3</doi><tpages>10</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1863-1703
ispartof Signal, image and video processing, 2024-11, Vol.18 (11), p.7779-7788
issn 1863-1703
1863-1711
language eng
recordid cdi_proquest_journals_3104475851
source Springer Nature
subjects Adaptive sampling
Computer Imaging
Computer Science
Computer vision
Data integration
Deformation effects
Effectiveness
Formability
Geometric transformation
Image enhancement
Image Processing and Computer Vision
Multimedia Information Systems
Object recognition
Original Paper
Pattern Recognition and Graphics
Signal,Image and Speech Processing
Target detection
Vision
title AFRNet: adaptive feature refinement network
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T20%3A51%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=AFRNet:%20adaptive%20feature%20refinement%20network&rft.jtitle=Signal,%20image%20and%20video%20processing&rft.au=Zhang,%20Jilong&rft.date=2024-11-01&rft.volume=18&rft.issue=11&rft.spage=7779&rft.epage=7788&rft.pages=7779-7788&rft.issn=1863-1703&rft.eissn=1863-1711&rft_id=info:doi/10.1007/s11760-024-03427-3&rft_dat=%3Cproquest_cross%3E3104475851%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c200t-a7b90fcabfa3730f8c22d1771cab3b72d9454deb375338a45a1a19283fa74c633%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3104475851&rft_id=info:pmid/&rfr_iscdi=true