Loading…
AFRNet: adaptive feature refinement network
In the domain of computer vision, object detection is a fundamental task, aimed at accurately identifying and localizing objects of various sizes within images. While existing models such as You Only Look Once, Adaptive Training Sample Selection, and Task-aligned One-stage Object Detection have made...
Saved in:
Published in: | Signal, image and video processing image and video processing, 2024-11, Vol.18 (11), p.7779-7788 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | cdi_FETCH-LOGICAL-c200t-a7b90fcabfa3730f8c22d1771cab3b72d9454deb375338a45a1a19283fa74c633 |
container_end_page | 7788 |
container_issue | 11 |
container_start_page | 7779 |
container_title | Signal, image and video processing |
container_volume | 18 |
creator | Zhang, Jilong Yang, Yanjiao Liu, Jienan Jiang, Jing Ma, Mei |
description | In the domain of computer vision, object detection is a fundamental task, aimed at accurately identifying and localizing objects of various sizes within images. While existing models such as You Only Look Once, Adaptive Training Sample Selection, and Task-aligned One-stage Object Detection have made breakthroughs in this field, they still exhibit deficiencies in information fusion within their neck structure. To overcome these limitations, we have designed an innovative model architecture known as Adaptive Feature Refinement Network (AFRNet). The model, on one hand, discards the conventional Feature Pyramid Network structure and designs a novel neck structure that incorporates the structures of Scale Sequence Feature Fusion (SSFF) model and the Gather-and-Distribute (GD) mechanism. Through experimentation, it has been demonstrated that the SSFF method can further enhance the multi-scale feature fusion of the GD mechanism, thereby improving the performance of the target detection task. On the other hand, to address the constraints of existing models in simulating geometric transformations, We have designed an advanced variable convolution structure called Attentive Deformable ConvNet. This structure integrates an improved attention mechanism, which allows for more precise capture of key features in images. Extensive experiments conducted on the MS-COCO dataset have validated the effectiveness of our model. In single-model, single-scale testing, our model achieved an Average Precision (AP) of 51.8%, a result that underscores a significant enhancement in object detection performance and confirms the efficacy of our model. |
doi_str_mv | 10.1007/s11760-024-03427-3 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3104475851</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3104475851</sourcerecordid><originalsourceid>FETCH-LOGICAL-c200t-a7b90fcabfa3730f8c22d1771cab3b72d9454deb375338a45a1a19283fa74c633</originalsourceid><addsrcrecordid>eNp9kE9LAzEQxYMoWGq_gKcFjxKdyWQ3qbdS_AdFQfQcsruJtNrdmqSK397UFb05lxmG994MP8aOEc4QQJ1HRFUBByE5kBSK0x4boa6Io0Lc_52BDtkkxhXkIqF0pUfsdHb1cOfSRWFbu0nLd1d4Z9M2uCI4v-zc2nWp6Fz66MPLETvw9jW6yU8fs6ery8f5DV_cX9_OZwveCIDEraqn4Btbe0uKwOtGiBaVwryiWol2KkvZuppUSaStLC1anApN3irZVERjdjLkbkL_tnUxmVW_DV0-aQhBSlXqErNKDKom9DHmb80mLNc2fBoEs-NiBi4mczHfXMwumgZTzOLu2YW_6H9cXzUwY5I</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3104475851</pqid></control><display><type>article</type><title>AFRNet: adaptive feature refinement network</title><source>Springer Nature</source><creator>Zhang, Jilong ; Yang, Yanjiao ; Liu, Jienan ; Jiang, Jing ; Ma, Mei</creator><creatorcontrib>Zhang, Jilong ; Yang, Yanjiao ; Liu, Jienan ; Jiang, Jing ; Ma, Mei</creatorcontrib><description>In the domain of computer vision, object detection is a fundamental task, aimed at accurately identifying and localizing objects of various sizes within images. While existing models such as You Only Look Once, Adaptive Training Sample Selection, and Task-aligned One-stage Object Detection have made breakthroughs in this field, they still exhibit deficiencies in information fusion within their neck structure. To overcome these limitations, we have designed an innovative model architecture known as Adaptive Feature Refinement Network (AFRNet). The model, on one hand, discards the conventional Feature Pyramid Network structure and designs a novel neck structure that incorporates the structures of Scale Sequence Feature Fusion (SSFF) model and the Gather-and-Distribute (GD) mechanism. Through experimentation, it has been demonstrated that the SSFF method can further enhance the multi-scale feature fusion of the GD mechanism, thereby improving the performance of the target detection task. On the other hand, to address the constraints of existing models in simulating geometric transformations, We have designed an advanced variable convolution structure called Attentive Deformable ConvNet. This structure integrates an improved attention mechanism, which allows for more precise capture of key features in images. Extensive experiments conducted on the MS-COCO dataset have validated the effectiveness of our model. In single-model, single-scale testing, our model achieved an Average Precision (AP) of 51.8%, a result that underscores a significant enhancement in object detection performance and confirms the efficacy of our model.</description><identifier>ISSN: 1863-1703</identifier><identifier>EISSN: 1863-1711</identifier><identifier>DOI: 10.1007/s11760-024-03427-3</identifier><language>eng</language><publisher>London: Springer London</publisher><subject>Adaptive sampling ; Computer Imaging ; Computer Science ; Computer vision ; Data integration ; Deformation effects ; Effectiveness ; Formability ; Geometric transformation ; Image enhancement ; Image Processing and Computer Vision ; Multimedia Information Systems ; Object recognition ; Original Paper ; Pattern Recognition and Graphics ; Signal,Image and Speech Processing ; Target detection ; Vision</subject><ispartof>Signal, image and video processing, 2024-11, Vol.18 (11), p.7779-7788</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c200t-a7b90fcabfa3730f8c22d1771cab3b72d9454deb375338a45a1a19283fa74c633</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Zhang, Jilong</creatorcontrib><creatorcontrib>Yang, Yanjiao</creatorcontrib><creatorcontrib>Liu, Jienan</creatorcontrib><creatorcontrib>Jiang, Jing</creatorcontrib><creatorcontrib>Ma, Mei</creatorcontrib><title>AFRNet: adaptive feature refinement network</title><title>Signal, image and video processing</title><addtitle>SIViP</addtitle><description>In the domain of computer vision, object detection is a fundamental task, aimed at accurately identifying and localizing objects of various sizes within images. While existing models such as You Only Look Once, Adaptive Training Sample Selection, and Task-aligned One-stage Object Detection have made breakthroughs in this field, they still exhibit deficiencies in information fusion within their neck structure. To overcome these limitations, we have designed an innovative model architecture known as Adaptive Feature Refinement Network (AFRNet). The model, on one hand, discards the conventional Feature Pyramid Network structure and designs a novel neck structure that incorporates the structures of Scale Sequence Feature Fusion (SSFF) model and the Gather-and-Distribute (GD) mechanism. Through experimentation, it has been demonstrated that the SSFF method can further enhance the multi-scale feature fusion of the GD mechanism, thereby improving the performance of the target detection task. On the other hand, to address the constraints of existing models in simulating geometric transformations, We have designed an advanced variable convolution structure called Attentive Deformable ConvNet. This structure integrates an improved attention mechanism, which allows for more precise capture of key features in images. Extensive experiments conducted on the MS-COCO dataset have validated the effectiveness of our model. In single-model, single-scale testing, our model achieved an Average Precision (AP) of 51.8%, a result that underscores a significant enhancement in object detection performance and confirms the efficacy of our model.</description><subject>Adaptive sampling</subject><subject>Computer Imaging</subject><subject>Computer Science</subject><subject>Computer vision</subject><subject>Data integration</subject><subject>Deformation effects</subject><subject>Effectiveness</subject><subject>Formability</subject><subject>Geometric transformation</subject><subject>Image enhancement</subject><subject>Image Processing and Computer Vision</subject><subject>Multimedia Information Systems</subject><subject>Object recognition</subject><subject>Original Paper</subject><subject>Pattern Recognition and Graphics</subject><subject>Signal,Image and Speech Processing</subject><subject>Target detection</subject><subject>Vision</subject><issn>1863-1703</issn><issn>1863-1711</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kE9LAzEQxYMoWGq_gKcFjxKdyWQ3qbdS_AdFQfQcsruJtNrdmqSK397UFb05lxmG994MP8aOEc4QQJ1HRFUBByE5kBSK0x4boa6Io0Lc_52BDtkkxhXkIqF0pUfsdHb1cOfSRWFbu0nLd1d4Z9M2uCI4v-zc2nWp6Fz66MPLETvw9jW6yU8fs6ery8f5DV_cX9_OZwveCIDEraqn4Btbe0uKwOtGiBaVwryiWol2KkvZuppUSaStLC1anApN3irZVERjdjLkbkL_tnUxmVW_DV0-aQhBSlXqErNKDKom9DHmb80mLNc2fBoEs-NiBi4mczHfXMwumgZTzOLu2YW_6H9cXzUwY5I</recordid><startdate>20241101</startdate><enddate>20241101</enddate><creator>Zhang, Jilong</creator><creator>Yang, Yanjiao</creator><creator>Liu, Jienan</creator><creator>Jiang, Jing</creator><creator>Ma, Mei</creator><general>Springer London</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20241101</creationdate><title>AFRNet: adaptive feature refinement network</title><author>Zhang, Jilong ; Yang, Yanjiao ; Liu, Jienan ; Jiang, Jing ; Ma, Mei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c200t-a7b90fcabfa3730f8c22d1771cab3b72d9454deb375338a45a1a19283fa74c633</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Adaptive sampling</topic><topic>Computer Imaging</topic><topic>Computer Science</topic><topic>Computer vision</topic><topic>Data integration</topic><topic>Deformation effects</topic><topic>Effectiveness</topic><topic>Formability</topic><topic>Geometric transformation</topic><topic>Image enhancement</topic><topic>Image Processing and Computer Vision</topic><topic>Multimedia Information Systems</topic><topic>Object recognition</topic><topic>Original Paper</topic><topic>Pattern Recognition and Graphics</topic><topic>Signal,Image and Speech Processing</topic><topic>Target detection</topic><topic>Vision</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Jilong</creatorcontrib><creatorcontrib>Yang, Yanjiao</creatorcontrib><creatorcontrib>Liu, Jienan</creatorcontrib><creatorcontrib>Jiang, Jing</creatorcontrib><creatorcontrib>Ma, Mei</creatorcontrib><collection>CrossRef</collection><jtitle>Signal, image and video processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhang, Jilong</au><au>Yang, Yanjiao</au><au>Liu, Jienan</au><au>Jiang, Jing</au><au>Ma, Mei</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>AFRNet: adaptive feature refinement network</atitle><jtitle>Signal, image and video processing</jtitle><stitle>SIViP</stitle><date>2024-11-01</date><risdate>2024</risdate><volume>18</volume><issue>11</issue><spage>7779</spage><epage>7788</epage><pages>7779-7788</pages><issn>1863-1703</issn><eissn>1863-1711</eissn><abstract>In the domain of computer vision, object detection is a fundamental task, aimed at accurately identifying and localizing objects of various sizes within images. While existing models such as You Only Look Once, Adaptive Training Sample Selection, and Task-aligned One-stage Object Detection have made breakthroughs in this field, they still exhibit deficiencies in information fusion within their neck structure. To overcome these limitations, we have designed an innovative model architecture known as Adaptive Feature Refinement Network (AFRNet). The model, on one hand, discards the conventional Feature Pyramid Network structure and designs a novel neck structure that incorporates the structures of Scale Sequence Feature Fusion (SSFF) model and the Gather-and-Distribute (GD) mechanism. Through experimentation, it has been demonstrated that the SSFF method can further enhance the multi-scale feature fusion of the GD mechanism, thereby improving the performance of the target detection task. On the other hand, to address the constraints of existing models in simulating geometric transformations, We have designed an advanced variable convolution structure called Attentive Deformable ConvNet. This structure integrates an improved attention mechanism, which allows for more precise capture of key features in images. Extensive experiments conducted on the MS-COCO dataset have validated the effectiveness of our model. In single-model, single-scale testing, our model achieved an Average Precision (AP) of 51.8%, a result that underscores a significant enhancement in object detection performance and confirms the efficacy of our model.</abstract><cop>London</cop><pub>Springer London</pub><doi>10.1007/s11760-024-03427-3</doi><tpages>10</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1863-1703 |
ispartof | Signal, image and video processing, 2024-11, Vol.18 (11), p.7779-7788 |
issn | 1863-1703 1863-1711 |
language | eng |
recordid | cdi_proquest_journals_3104475851 |
source | Springer Nature |
subjects | Adaptive sampling Computer Imaging Computer Science Computer vision Data integration Deformation effects Effectiveness Formability Geometric transformation Image enhancement Image Processing and Computer Vision Multimedia Information Systems Object recognition Original Paper Pattern Recognition and Graphics Signal,Image and Speech Processing Target detection Vision |
title | AFRNet: adaptive feature refinement network |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T20%3A51%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=AFRNet:%20adaptive%20feature%20refinement%20network&rft.jtitle=Signal,%20image%20and%20video%20processing&rft.au=Zhang,%20Jilong&rft.date=2024-11-01&rft.volume=18&rft.issue=11&rft.spage=7779&rft.epage=7788&rft.pages=7779-7788&rft.issn=1863-1703&rft.eissn=1863-1711&rft_id=info:doi/10.1007/s11760-024-03427-3&rft_dat=%3Cproquest_cross%3E3104475851%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c200t-a7b90fcabfa3730f8c22d1771cab3b72d9454deb375338a45a1a19283fa74c633%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3104475851&rft_id=info:pmid/&rfr_iscdi=true |