Loading…

Video object detection via space–time feature aggregation and result reuse

When detecting the objects in videos, motion always leads to object deterioration, like blurring and occlusion, as well as the strange state of the object's shape and posture. Consequently, the detection of video frames will lead to a decline in accuracy by using the image object detection mode...

Full description

Saved in:
Bibliographic Details
Published in:IET image processing 2024-10, Vol.18 (12), p.3356-3367
Main Authors: Duan, Liang, Yang, Rongfei, Yue, Kun, Sun, Zhengbao, Yuan, Guowu
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c2989-3f9f758211fad98fb49d24de8c0199718e235172a5984991699d32e60a2820453
container_end_page 3367
container_issue 12
container_start_page 3356
container_title IET image processing
container_volume 18
creator Duan, Liang
Yang, Rongfei
Yue, Kun
Sun, Zhengbao
Yuan, Guowu
description When detecting the objects in videos, motion always leads to object deterioration, like blurring and occlusion, as well as the strange state of the object's shape and posture. Consequently, the detection of video frames will lead to a decline in accuracy by using the image object detection model. This paper proposes an online video object detection method based on the one‐stage detector YOLOx. First, the module for space–time feature aggregation is given, which uses the space–time information of past frames to enhance the feature quality of the current frame. Then, the module for result reuse is given, which incorporates the detection results of past frames to improve the detection stability of the current frame. By these two modules, the trade‐off between accuracy and speed of video object detection could be achieved. Experimental results on the ImageNet VID show the improvement of speed and accuracy of the proposed method. A space–time feature aggregation (STFA) module to retrieve highly relevant space–time features from memory frame features and aggregate them to the current frame. A module is proposed to reuse the stable detection results (result reuse) in the past frame to the current frame, which makes the detection results more stable. An online video object detection method is proposed by combining the space–time feature aggregation and result reuse modules. The detection accuracy is improved by using the space–time information and detection results of past frames in the video sequence to achieve the trade‐off between accuracy and speed.
doi_str_mv 10.1049/ipr2.13179
format article
fullrecord <record><control><sourceid>wiley_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_b12e0c4d9fa0406d85ba4a3b2ebc7951</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><doaj_id>oai_doaj_org_article_b12e0c4d9fa0406d85ba4a3b2ebc7951</doaj_id><sourcerecordid>IPR213179</sourcerecordid><originalsourceid>FETCH-LOGICAL-c2989-3f9f758211fad98fb49d24de8c0199718e235172a5984991699d32e60a2820453</originalsourceid><addsrcrecordid>eNp9kM1Kw0AUhQdRsFY3PkHWQnRmMpPMXUrxp1BQRN0ON5k7YUralEmqdOc7-IY-iWkrXbo6l8t3vsVh7FLwa8EV3IRVlNciEwUcsZEotEghz4vjw63hlJ113ZxzDdzoEZu9B0dt0pZzqvrEUT9EaJfJR8CkW2FFP1_ffVhQ4gn7daQE6zpSjTsIly6J1K2bfoh1R-fsxGPT0cVfjtnb_d3r5DGdPT1MJ7eztJJgIM08-EIbKYRHB8aXCpxUjkzFBUAhDMlMi0KiBqMARA7gMkk5R2kkVzobs-ne61qc21UMC4wb22Kwu0cba4uxD1VDthSSeKUceOSK587oEhVmpaSyKkCLwXW1d1Wx7bpI_uAT3G43tdtN7W7TARZ7-DM0tPmHtNPnF7nv_AKSGHlE</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Video object detection via space–time feature aggregation and result reuse</title><source>IET Digital Library Journals</source><source>Wiley Open Access</source><creator>Duan, Liang ; Yang, Rongfei ; Yue, Kun ; Sun, Zhengbao ; Yuan, Guowu</creator><creatorcontrib>Duan, Liang ; Yang, Rongfei ; Yue, Kun ; Sun, Zhengbao ; Yuan, Guowu</creatorcontrib><description>When detecting the objects in videos, motion always leads to object deterioration, like blurring and occlusion, as well as the strange state of the object's shape and posture. Consequently, the detection of video frames will lead to a decline in accuracy by using the image object detection model. This paper proposes an online video object detection method based on the one‐stage detector YOLOx. First, the module for space–time feature aggregation is given, which uses the space–time information of past frames to enhance the feature quality of the current frame. Then, the module for result reuse is given, which incorporates the detection results of past frames to improve the detection stability of the current frame. By these two modules, the trade‐off between accuracy and speed of video object detection could be achieved. Experimental results on the ImageNet VID show the improvement of speed and accuracy of the proposed method. A space–time feature aggregation (STFA) module to retrieve highly relevant space–time features from memory frame features and aggregate them to the current frame. A module is proposed to reuse the stable detection results (result reuse) in the past frame to the current frame, which makes the detection results more stable. An online video object detection method is proposed by combining the space–time feature aggregation and result reuse modules. The detection accuracy is improved by using the space–time information and detection results of past frames in the video sequence to achieve the trade‐off between accuracy and speed.</description><identifier>ISSN: 1751-9659</identifier><identifier>EISSN: 1751-9667</identifier><identifier>DOI: 10.1049/ipr2.13179</identifier><language>eng</language><publisher>Wiley</publisher><subject>feature extraction ; object detection</subject><ispartof>IET image processing, 2024-10, Vol.18 (12), p.3356-3367</ispartof><rights>2024 The Author(s). published by John Wiley &amp; Sons Ltd on behalf of The Institution of Engineering and Technology.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c2989-3f9f758211fad98fb49d24de8c0199718e235172a5984991699d32e60a2820453</cites><orcidid>0000-0003-3641-1461 ; 0000-0002-8449-6861</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1049%2Fipr2.13179$$EPDF$$P50$$Gwiley$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1049%2Fipr2.13179$$EHTML$$P50$$Gwiley$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,11541,27901,27902,46027,46451</link.rule.ids></links><search><creatorcontrib>Duan, Liang</creatorcontrib><creatorcontrib>Yang, Rongfei</creatorcontrib><creatorcontrib>Yue, Kun</creatorcontrib><creatorcontrib>Sun, Zhengbao</creatorcontrib><creatorcontrib>Yuan, Guowu</creatorcontrib><title>Video object detection via space–time feature aggregation and result reuse</title><title>IET image processing</title><description>When detecting the objects in videos, motion always leads to object deterioration, like blurring and occlusion, as well as the strange state of the object's shape and posture. Consequently, the detection of video frames will lead to a decline in accuracy by using the image object detection model. This paper proposes an online video object detection method based on the one‐stage detector YOLOx. First, the module for space–time feature aggregation is given, which uses the space–time information of past frames to enhance the feature quality of the current frame. Then, the module for result reuse is given, which incorporates the detection results of past frames to improve the detection stability of the current frame. By these two modules, the trade‐off between accuracy and speed of video object detection could be achieved. Experimental results on the ImageNet VID show the improvement of speed and accuracy of the proposed method. A space–time feature aggregation (STFA) module to retrieve highly relevant space–time features from memory frame features and aggregate them to the current frame. A module is proposed to reuse the stable detection results (result reuse) in the past frame to the current frame, which makes the detection results more stable. An online video object detection method is proposed by combining the space–time feature aggregation and result reuse modules. The detection accuracy is improved by using the space–time information and detection results of past frames in the video sequence to achieve the trade‐off between accuracy and speed.</description><subject>feature extraction</subject><subject>object detection</subject><issn>1751-9659</issn><issn>1751-9667</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>24P</sourceid><sourceid>DOA</sourceid><recordid>eNp9kM1Kw0AUhQdRsFY3PkHWQnRmMpPMXUrxp1BQRN0ON5k7YUralEmqdOc7-IY-iWkrXbo6l8t3vsVh7FLwa8EV3IRVlNciEwUcsZEotEghz4vjw63hlJ113ZxzDdzoEZu9B0dt0pZzqvrEUT9EaJfJR8CkW2FFP1_ffVhQ4gn7daQE6zpSjTsIly6J1K2bfoh1R-fsxGPT0cVfjtnb_d3r5DGdPT1MJ7eztJJgIM08-EIbKYRHB8aXCpxUjkzFBUAhDMlMi0KiBqMARA7gMkk5R2kkVzobs-ne61qc21UMC4wb22Kwu0cba4uxD1VDthSSeKUceOSK587oEhVmpaSyKkCLwXW1d1Wx7bpI_uAT3G43tdtN7W7TARZ7-DM0tPmHtNPnF7nv_AKSGHlE</recordid><startdate>20241001</startdate><enddate>20241001</enddate><creator>Duan, Liang</creator><creator>Yang, Rongfei</creator><creator>Yue, Kun</creator><creator>Sun, Zhengbao</creator><creator>Yuan, Guowu</creator><general>Wiley</general><scope>24P</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0003-3641-1461</orcidid><orcidid>https://orcid.org/0000-0002-8449-6861</orcidid></search><sort><creationdate>20241001</creationdate><title>Video object detection via space–time feature aggregation and result reuse</title><author>Duan, Liang ; Yang, Rongfei ; Yue, Kun ; Sun, Zhengbao ; Yuan, Guowu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c2989-3f9f758211fad98fb49d24de8c0199718e235172a5984991699d32e60a2820453</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>feature extraction</topic><topic>object detection</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Duan, Liang</creatorcontrib><creatorcontrib>Yang, Rongfei</creatorcontrib><creatorcontrib>Yue, Kun</creatorcontrib><creatorcontrib>Sun, Zhengbao</creatorcontrib><creatorcontrib>Yuan, Guowu</creatorcontrib><collection>Wiley Open Access</collection><collection>CrossRef</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IET image processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Duan, Liang</au><au>Yang, Rongfei</au><au>Yue, Kun</au><au>Sun, Zhengbao</au><au>Yuan, Guowu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Video object detection via space–time feature aggregation and result reuse</atitle><jtitle>IET image processing</jtitle><date>2024-10-01</date><risdate>2024</risdate><volume>18</volume><issue>12</issue><spage>3356</spage><epage>3367</epage><pages>3356-3367</pages><issn>1751-9659</issn><eissn>1751-9667</eissn><abstract>When detecting the objects in videos, motion always leads to object deterioration, like blurring and occlusion, as well as the strange state of the object's shape and posture. Consequently, the detection of video frames will lead to a decline in accuracy by using the image object detection model. This paper proposes an online video object detection method based on the one‐stage detector YOLOx. First, the module for space–time feature aggregation is given, which uses the space–time information of past frames to enhance the feature quality of the current frame. Then, the module for result reuse is given, which incorporates the detection results of past frames to improve the detection stability of the current frame. By these two modules, the trade‐off between accuracy and speed of video object detection could be achieved. Experimental results on the ImageNet VID show the improvement of speed and accuracy of the proposed method. A space–time feature aggregation (STFA) module to retrieve highly relevant space–time features from memory frame features and aggregate them to the current frame. A module is proposed to reuse the stable detection results (result reuse) in the past frame to the current frame, which makes the detection results more stable. An online video object detection method is proposed by combining the space–time feature aggregation and result reuse modules. The detection accuracy is improved by using the space–time information and detection results of past frames in the video sequence to achieve the trade‐off between accuracy and speed.</abstract><pub>Wiley</pub><doi>10.1049/ipr2.13179</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0003-3641-1461</orcidid><orcidid>https://orcid.org/0000-0002-8449-6861</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1751-9659
ispartof IET image processing, 2024-10, Vol.18 (12), p.3356-3367
issn 1751-9659
1751-9667
language eng
recordid cdi_doaj_primary_oai_doaj_org_article_b12e0c4d9fa0406d85ba4a3b2ebc7951
source IET Digital Library Journals; Wiley Open Access
subjects feature extraction
object detection
title Video object detection via space–time feature aggregation and result reuse
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-16T22%3A10%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-wiley_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Video%20object%20detection%20via%20space%E2%80%93time%20feature%20aggregation%20and%20result%20reuse&rft.jtitle=IET%20image%20processing&rft.au=Duan,%20Liang&rft.date=2024-10-01&rft.volume=18&rft.issue=12&rft.spage=3356&rft.epage=3367&rft.pages=3356-3367&rft.issn=1751-9659&rft.eissn=1751-9667&rft_id=info:doi/10.1049/ipr2.13179&rft_dat=%3Cwiley_doaj_%3EIPR213179%3C/wiley_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c2989-3f9f758211fad98fb49d24de8c0199718e235172a5984991699d32e60a2820453%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true