Loading…
Feature flow: In-network feature flow estimation for video object detection
•A type of shallow modules are proposed to directly predict the feature flow for feature alignment in a single network.•Self-supervision learning is introduced to further improve the quality of the predicted feature flow.•A new state-of-the-art performance is shown by comparing with other methods, w...
Saved in:
Published in: | Pattern recognition 2022-02, Vol.122, p.108323, Article 108323 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c306t-469cc3ae3b094d2834e63ee2db370a7327d0e815f595862c6c75829084e5e7903 |
---|---|
cites | cdi_FETCH-LOGICAL-c306t-469cc3ae3b094d2834e63ee2db370a7327d0e815f595862c6c75829084e5e7903 |
container_end_page | |
container_issue | |
container_start_page | 108323 |
container_title | Pattern recognition |
container_volume | 122 |
creator | Jin, Ruibing Lin, Guosheng Wen, Changyun Wang, Jianliang Liu, Fayao |
description | •A type of shallow modules are proposed to directly predict the feature flow for feature alignment in a single network.•Self-supervision learning is introduced to further improve the quality of the predicted feature flow.•A new state-of-the-art performance is shown by comparing with other methods, while a fast inference speed is maintained.
Optical flow, which expresses pixel displacement, is widely used in many computer vision tasks to provide pixel-level motion information. However, with the remarkable progress of the convolutional neural network, recent state-of-the-art approaches are proposed to solve problems directly on feature-level. Since the displacement of feature vector is not consistent with the pixel displacement, a common approach is to forward optical flow to a neural network and fine-tune this network on the task dataset. With this method, they expect the fine-tuned network to produce tensors encoding feature-level motion information. In this paper, we rethink about this de facto paradigm and analyze its drawbacks in the video object detection task. To mitigate these issues, we propose a novel network (IFF-Net) with an In-network Feature Flow estimation module (IFF module) for video object detection. Without resorting to pre-training on any additional dataset, our IFF module is able to directly produce feature flow which indicates the feature displacement. Our IFF module consists of a shallow module, which shares the features with the detection branches. This compact design enables our IFF-Net to accurately detect objects, while maintaining a fast inference speed. Furthermore, we propose a transformation residual loss (TRL) based on self-supervision, which further improves the performance of our IFF-Net. Our IFF-Net outperforms existing methods and achieves new state-of-the-art performance on ImageNet VID. |
doi_str_mv | 10.1016/j.patcog.2021.108323 |
format | article |
fullrecord | <record><control><sourceid>elsevier_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1016_j_patcog_2021_108323</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0031320321005033</els_id><sourcerecordid>S0031320321005033</sourcerecordid><originalsourceid>FETCH-LOGICAL-c306t-469cc3ae3b094d2834e63ee2db370a7327d0e815f595862c6c75829084e5e7903</originalsourceid><addsrcrecordid>eNp9kEFLAzEUhIMoWKv_wEP-wNaXZHeT9SBIsVoseNFz2E3eSta6KUls8d-bsh48eRp4wwzzPkKuGSwYsPpmWOzaZPz7ggNn-aQEFydkxpQURcVKfkpmAIIVgoM4JxcxDgBMZmNGnlfYpq-AtN_6wy1dj8WI6eDDB-3_GBRjcp9tcn6kvQ907yx66rsBTaIWU5ZsXZKzvt1GvPrVOXlbPbwun4rNy-N6eb8pjIA6FWXdGCNaFB00peVKlFgLRG47IaGVgksLqFjVV02lam5qIyvFG1AlVigbEHNSTr0m-BgD9noX8rrwrRnoIxA96AmIPgLRE5Acu5timLftHQYdjcPRoHUhP6Ctd_8X_AAVJmrp</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Feature flow: In-network feature flow estimation for video object detection</title><source>ScienceDirect Freedom Collection 2022-2024</source><creator>Jin, Ruibing ; Lin, Guosheng ; Wen, Changyun ; Wang, Jianliang ; Liu, Fayao</creator><creatorcontrib>Jin, Ruibing ; Lin, Guosheng ; Wen, Changyun ; Wang, Jianliang ; Liu, Fayao</creatorcontrib><description>•A type of shallow modules are proposed to directly predict the feature flow for feature alignment in a single network.•Self-supervision learning is introduced to further improve the quality of the predicted feature flow.•A new state-of-the-art performance is shown by comparing with other methods, while a fast inference speed is maintained.
Optical flow, which expresses pixel displacement, is widely used in many computer vision tasks to provide pixel-level motion information. However, with the remarkable progress of the convolutional neural network, recent state-of-the-art approaches are proposed to solve problems directly on feature-level. Since the displacement of feature vector is not consistent with the pixel displacement, a common approach is to forward optical flow to a neural network and fine-tune this network on the task dataset. With this method, they expect the fine-tuned network to produce tensors encoding feature-level motion information. In this paper, we rethink about this de facto paradigm and analyze its drawbacks in the video object detection task. To mitigate these issues, we propose a novel network (IFF-Net) with an In-network Feature Flow estimation module (IFF module) for video object detection. Without resorting to pre-training on any additional dataset, our IFF module is able to directly produce feature flow which indicates the feature displacement. Our IFF module consists of a shallow module, which shares the features with the detection branches. This compact design enables our IFF-Net to accurately detect objects, while maintaining a fast inference speed. Furthermore, we propose a transformation residual loss (TRL) based on self-supervision, which further improves the performance of our IFF-Net. Our IFF-Net outperforms existing methods and achieves new state-of-the-art performance on ImageNet VID.</description><identifier>ISSN: 0031-3203</identifier><identifier>EISSN: 1873-5142</identifier><identifier>DOI: 10.1016/j.patcog.2021.108323</identifier><language>eng</language><publisher>Elsevier Ltd</publisher><subject>Deep convolutional neural network (DCNN) ; Feature flow ; Object detection ; Video analysis ; Video object detection</subject><ispartof>Pattern recognition, 2022-02, Vol.122, p.108323, Article 108323</ispartof><rights>2021</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c306t-469cc3ae3b094d2834e63ee2db370a7327d0e815f595862c6c75829084e5e7903</citedby><cites>FETCH-LOGICAL-c306t-469cc3ae3b094d2834e63ee2db370a7327d0e815f595862c6c75829084e5e7903</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27901,27902</link.rule.ids></links><search><creatorcontrib>Jin, Ruibing</creatorcontrib><creatorcontrib>Lin, Guosheng</creatorcontrib><creatorcontrib>Wen, Changyun</creatorcontrib><creatorcontrib>Wang, Jianliang</creatorcontrib><creatorcontrib>Liu, Fayao</creatorcontrib><title>Feature flow: In-network feature flow estimation for video object detection</title><title>Pattern recognition</title><description>•A type of shallow modules are proposed to directly predict the feature flow for feature alignment in a single network.•Self-supervision learning is introduced to further improve the quality of the predicted feature flow.•A new state-of-the-art performance is shown by comparing with other methods, while a fast inference speed is maintained.
Optical flow, which expresses pixel displacement, is widely used in many computer vision tasks to provide pixel-level motion information. However, with the remarkable progress of the convolutional neural network, recent state-of-the-art approaches are proposed to solve problems directly on feature-level. Since the displacement of feature vector is not consistent with the pixel displacement, a common approach is to forward optical flow to a neural network and fine-tune this network on the task dataset. With this method, they expect the fine-tuned network to produce tensors encoding feature-level motion information. In this paper, we rethink about this de facto paradigm and analyze its drawbacks in the video object detection task. To mitigate these issues, we propose a novel network (IFF-Net) with an In-network Feature Flow estimation module (IFF module) for video object detection. Without resorting to pre-training on any additional dataset, our IFF module is able to directly produce feature flow which indicates the feature displacement. Our IFF module consists of a shallow module, which shares the features with the detection branches. This compact design enables our IFF-Net to accurately detect objects, while maintaining a fast inference speed. Furthermore, we propose a transformation residual loss (TRL) based on self-supervision, which further improves the performance of our IFF-Net. Our IFF-Net outperforms existing methods and achieves new state-of-the-art performance on ImageNet VID.</description><subject>Deep convolutional neural network (DCNN)</subject><subject>Feature flow</subject><subject>Object detection</subject><subject>Video analysis</subject><subject>Video object detection</subject><issn>0031-3203</issn><issn>1873-5142</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNp9kEFLAzEUhIMoWKv_wEP-wNaXZHeT9SBIsVoseNFz2E3eSta6KUls8d-bsh48eRp4wwzzPkKuGSwYsPpmWOzaZPz7ggNn-aQEFydkxpQURcVKfkpmAIIVgoM4JxcxDgBMZmNGnlfYpq-AtN_6wy1dj8WI6eDDB-3_GBRjcp9tcn6kvQ907yx66rsBTaIWU5ZsXZKzvt1GvPrVOXlbPbwun4rNy-N6eb8pjIA6FWXdGCNaFB00peVKlFgLRG47IaGVgksLqFjVV02lam5qIyvFG1AlVigbEHNSTr0m-BgD9noX8rrwrRnoIxA96AmIPgLRE5Acu5timLftHQYdjcPRoHUhP6Ctd_8X_AAVJmrp</recordid><startdate>202202</startdate><enddate>202202</enddate><creator>Jin, Ruibing</creator><creator>Lin, Guosheng</creator><creator>Wen, Changyun</creator><creator>Wang, Jianliang</creator><creator>Liu, Fayao</creator><general>Elsevier Ltd</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>202202</creationdate><title>Feature flow: In-network feature flow estimation for video object detection</title><author>Jin, Ruibing ; Lin, Guosheng ; Wen, Changyun ; Wang, Jianliang ; Liu, Fayao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c306t-469cc3ae3b094d2834e63ee2db370a7327d0e815f595862c6c75829084e5e7903</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Deep convolutional neural network (DCNN)</topic><topic>Feature flow</topic><topic>Object detection</topic><topic>Video analysis</topic><topic>Video object detection</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jin, Ruibing</creatorcontrib><creatorcontrib>Lin, Guosheng</creatorcontrib><creatorcontrib>Wen, Changyun</creatorcontrib><creatorcontrib>Wang, Jianliang</creatorcontrib><creatorcontrib>Liu, Fayao</creatorcontrib><collection>CrossRef</collection><jtitle>Pattern recognition</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Jin, Ruibing</au><au>Lin, Guosheng</au><au>Wen, Changyun</au><au>Wang, Jianliang</au><au>Liu, Fayao</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Feature flow: In-network feature flow estimation for video object detection</atitle><jtitle>Pattern recognition</jtitle><date>2022-02</date><risdate>2022</risdate><volume>122</volume><spage>108323</spage><pages>108323-</pages><artnum>108323</artnum><issn>0031-3203</issn><eissn>1873-5142</eissn><abstract>•A type of shallow modules are proposed to directly predict the feature flow for feature alignment in a single network.•Self-supervision learning is introduced to further improve the quality of the predicted feature flow.•A new state-of-the-art performance is shown by comparing with other methods, while a fast inference speed is maintained.
Optical flow, which expresses pixel displacement, is widely used in many computer vision tasks to provide pixel-level motion information. However, with the remarkable progress of the convolutional neural network, recent state-of-the-art approaches are proposed to solve problems directly on feature-level. Since the displacement of feature vector is not consistent with the pixel displacement, a common approach is to forward optical flow to a neural network and fine-tune this network on the task dataset. With this method, they expect the fine-tuned network to produce tensors encoding feature-level motion information. In this paper, we rethink about this de facto paradigm and analyze its drawbacks in the video object detection task. To mitigate these issues, we propose a novel network (IFF-Net) with an In-network Feature Flow estimation module (IFF module) for video object detection. Without resorting to pre-training on any additional dataset, our IFF module is able to directly produce feature flow which indicates the feature displacement. Our IFF module consists of a shallow module, which shares the features with the detection branches. This compact design enables our IFF-Net to accurately detect objects, while maintaining a fast inference speed. Furthermore, we propose a transformation residual loss (TRL) based on self-supervision, which further improves the performance of our IFF-Net. Our IFF-Net outperforms existing methods and achieves new state-of-the-art performance on ImageNet VID.</abstract><pub>Elsevier Ltd</pub><doi>10.1016/j.patcog.2021.108323</doi></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0031-3203 |
ispartof | Pattern recognition, 2022-02, Vol.122, p.108323, Article 108323 |
issn | 0031-3203 1873-5142 |
language | eng |
recordid | cdi_crossref_primary_10_1016_j_patcog_2021_108323 |
source | ScienceDirect Freedom Collection 2022-2024 |
subjects | Deep convolutional neural network (DCNN) Feature flow Object detection Video analysis Video object detection |
title | Feature flow: In-network feature flow estimation for video object detection |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T11%3A23%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-elsevier_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Feature%20flow:%20In-network%20feature%20flow%20estimation%20for%20video%20object%20detection&rft.jtitle=Pattern%20recognition&rft.au=Jin,%20Ruibing&rft.date=2022-02&rft.volume=122&rft.spage=108323&rft.pages=108323-&rft.artnum=108323&rft.issn=0031-3203&rft.eissn=1873-5142&rft_id=info:doi/10.1016/j.patcog.2021.108323&rft_dat=%3Celsevier_cross%3ES0031320321005033%3C/elsevier_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c306t-469cc3ae3b094d2834e63ee2db370a7327d0e815f595862c6c75829084e5e7903%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |