Loading…

Self-Sufficient Feature Enhancing Networks for Video Salient Object Detection

Detecting salient objects in videos is a very challenging task. Current state-of-the-art methods are dominated by motion based deep neural networks, among which optical flow is often leveraged as motion representation. Though with robust performance, these optical flow-based video salient object det...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on multimedia 2023, Vol.25, p.557-571
Main Authors:	Kong, Yongqiang, Wang, Yunhong, Li, Annan, Huang, Qiuyu
Format:	Article
Language:	English
Subjects:	Artificial neural networks Datasets deep network Estimation feature enhancing module Feature extraction joint training Machine learning Object detection Object recognition Optical communication Optical flow (image analysis) Optical imaging Salience Self sufficiency Spatiotemporal phenomena State of the art Task analysis Video salient object detection Visualization
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c291t-ce4e92d75003a5556a1133f5e113010e8cb5e635bac4fa6c58d5f06d339b15fb3
cites	cdi_FETCH-LOGICAL-c291t-ce4e92d75003a5556a1133f5e113010e8cb5e635bac4fa6c58d5f06d339b15fb3
container_end_page	571
container_issue
container_start_page	557
container_title	IEEE transactions on multimedia
container_volume	25
creator	Kong, Yongqiang Wang, Yunhong Li, Annan Huang, Qiuyu
description	Detecting salient objects in videos is a very challenging task. Current state-of-the-art methods are dominated by motion based deep neural networks, among which optical flow is often leveraged as motion representation. Though with robust performance, these optical flow-based video salient object detection methods face at least two problems that may hinder their generalization and application. First, computing optical flow as a pre-processing step does not support direct end-to-end learning; second, little attention has been given to the quality of visual features due to high computational cost of spatiotemporal feature encoding. In this paper we propose a novel self-sufficient feature enhancing network (SFENet) for video salient object detection, which leverages optical flow estimation as an auxiliary task while being end-to-end trainable. With a joint training scheme of both salient object detection and optical flow estimation, its multi-task architecture can be totally self-sufficient for achieving good performance without any pre-processing. Furthermore, for improving feature quality, we design four lightweight modules in spatial and temporal domains, including cross-layer fusion, multi-level warping, spatial-channel attention and boundary-aware refinement. The proposed method is evaluated through extensive experiments on five video salient object detection datasets. Experimental results show that our SFENet can be easily trained with fast convergence speed. It significantly outperforms previous methods in terms of various evaluation metrics. Moreover, with optical flow estimation and unsupervised video object segmentation as example applications, our method also yields state-of-the-art results on standard datasets.
doi_str_mv	10.1109/TMM.2021.3129052
format	article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2774332835</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9619922</ieee_id><sourcerecordid>2774332835</sourcerecordid><originalsourceid>FETCH-LOGICAL-c291t-ce4e92d75003a5556a1133f5e113010e8cb5e635bac4fa6c58d5f06d339b15fb3</originalsourceid><addsrcrecordid>eNo9kDFPwzAQhS0EEqWwI7FEYk65s-OkHlGhgNTSoYXVcpwzpJS4OIkQ_x6XVkzvhu-9kz7GLhFGiKBuVvP5iAPHkUCuQPIjNkCVYQpQFMfxlhxSxRFO2VnbrgEwk1AM2HxJG5cue-dqW1PTJVMyXR8ouW_eTWPr5i15pu7bh482cT4kr3VFPlmazR-8KNdku-SOuhi1b87ZiTObli4OOWQv0_vV5DGdLR6eJrez1HKFXWopI8WrQgIII6XMDaIQTlIMQKCxLSXlQpbGZs7kVo4r6SCvhFAlSleKIbve726D_-qp7fTa96GJLzUvikwIPhYyUrCnbPBtG8jpbag_TfjRCHonTUdpeidNH6TFytW-UhPRP65yVIpz8QtdP2dM</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2774332835</pqid></control><display><type>article</type><title>Self-Sufficient Feature Enhancing Networks for Video Salient Object Detection</title><source>IEEE Xplore (Online service)</source><creator>Kong, Yongqiang ; Wang, Yunhong ; Li, Annan ; Huang, Qiuyu</creator><creatorcontrib>Kong, Yongqiang ; Wang, Yunhong ; Li, Annan ; Huang, Qiuyu</creatorcontrib><description>Detecting salient objects in videos is a very challenging task. Current state-of-the-art methods are dominated by motion based deep neural networks, among which optical flow is often leveraged as motion representation. Though with robust performance, these optical flow-based video salient object detection methods face at least two problems that may hinder their generalization and application. First, computing optical flow as a pre-processing step does not support direct end-to-end learning; second, little attention has been given to the quality of visual features due to high computational cost of spatiotemporal feature encoding. In this paper we propose a novel self-sufficient feature enhancing network (SFENet) for video salient object detection, which leverages optical flow estimation as an auxiliary task while being end-to-end trainable. With a joint training scheme of both salient object detection and optical flow estimation, its multi-task architecture can be totally self-sufficient for achieving good performance without any pre-processing. Furthermore, for improving feature quality, we design four lightweight modules in spatial and temporal domains, including cross-layer fusion, multi-level warping, spatial-channel attention and boundary-aware refinement. The proposed method is evaluated through extensive experiments on five video salient object detection datasets. Experimental results show that our SFENet can be easily trained with fast convergence speed. It significantly outperforms previous methods in terms of various evaluation metrics. Moreover, with optical flow estimation and unsupervised video object segmentation as example applications, our method also yields state-of-the-art results on standard datasets.</description><identifier>ISSN: 1520-9210</identifier><identifier>EISSN: 1941-0077</identifier><identifier>DOI: 10.1109/TMM.2021.3129052</identifier><identifier>CODEN: ITMUF8</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Artificial neural networks ; Datasets ; deep network ; Estimation ; feature enhancing module ; Feature extraction ; joint training ; Machine learning ; Object detection ; Object recognition ; Optical communication ; Optical flow (image analysis) ; Optical imaging ; Salience ; Self sufficiency ; Spatiotemporal phenomena ; State of the art ; Task analysis ; Video salient object detection ; Visualization</subject><ispartof>IEEE transactions on multimedia, 2023, Vol.25, p.557-571</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c291t-ce4e92d75003a5556a1133f5e113010e8cb5e635bac4fa6c58d5f06d339b15fb3</citedby><cites>FETCH-LOGICAL-c291t-ce4e92d75003a5556a1133f5e113010e8cb5e635bac4fa6c58d5f06d339b15fb3</cites><orcidid>0000-0003-3497-5052 ; 0000-0001-6793-2492 ; 0000-0001-8001-2703</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9619922$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,4024,27923,27924,27925,54796</link.rule.ids></links><search><creatorcontrib>Kong, Yongqiang</creatorcontrib><creatorcontrib>Wang, Yunhong</creatorcontrib><creatorcontrib>Li, Annan</creatorcontrib><creatorcontrib>Huang, Qiuyu</creatorcontrib><title>Self-Sufficient Feature Enhancing Networks for Video Salient Object Detection</title><title>IEEE transactions on multimedia</title><addtitle>TMM</addtitle><description>Detecting salient objects in videos is a very challenging task. Current state-of-the-art methods are dominated by motion based deep neural networks, among which optical flow is often leveraged as motion representation. Though with robust performance, these optical flow-based video salient object detection methods face at least two problems that may hinder their generalization and application. First, computing optical flow as a pre-processing step does not support direct end-to-end learning; second, little attention has been given to the quality of visual features due to high computational cost of spatiotemporal feature encoding. In this paper we propose a novel self-sufficient feature enhancing network (SFENet) for video salient object detection, which leverages optical flow estimation as an auxiliary task while being end-to-end trainable. With a joint training scheme of both salient object detection and optical flow estimation, its multi-task architecture can be totally self-sufficient for achieving good performance without any pre-processing. Furthermore, for improving feature quality, we design four lightweight modules in spatial and temporal domains, including cross-layer fusion, multi-level warping, spatial-channel attention and boundary-aware refinement. The proposed method is evaluated through extensive experiments on five video salient object detection datasets. Experimental results show that our SFENet can be easily trained with fast convergence speed. It significantly outperforms previous methods in terms of various evaluation metrics. Moreover, with optical flow estimation and unsupervised video object segmentation as example applications, our method also yields state-of-the-art results on standard datasets.</description><subject>Artificial neural networks</subject><subject>Datasets</subject><subject>deep network</subject><subject>Estimation</subject><subject>feature enhancing module</subject><subject>Feature extraction</subject><subject>joint training</subject><subject>Machine learning</subject><subject>Object detection</subject><subject>Object recognition</subject><subject>Optical communication</subject><subject>Optical flow (image analysis)</subject><subject>Optical imaging</subject><subject>Salience</subject><subject>Self sufficiency</subject><subject>Spatiotemporal phenomena</subject><subject>State of the art</subject><subject>Task analysis</subject><subject>Video salient object detection</subject><subject>Visualization</subject><issn>1520-9210</issn><issn>1941-0077</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNo9kDFPwzAQhS0EEqWwI7FEYk65s-OkHlGhgNTSoYXVcpwzpJS4OIkQ_x6XVkzvhu-9kz7GLhFGiKBuVvP5iAPHkUCuQPIjNkCVYQpQFMfxlhxSxRFO2VnbrgEwk1AM2HxJG5cue-dqW1PTJVMyXR8ouW_eTWPr5i15pu7bh482cT4kr3VFPlmazR-8KNdku-SOuhi1b87ZiTObli4OOWQv0_vV5DGdLR6eJrez1HKFXWopI8WrQgIII6XMDaIQTlIMQKCxLSXlQpbGZs7kVo4r6SCvhFAlSleKIbve726D_-qp7fTa96GJLzUvikwIPhYyUrCnbPBtG8jpbag_TfjRCHonTUdpeidNH6TFytW-UhPRP65yVIpz8QtdP2dM</recordid><startdate>2023</startdate><enddate>2023</enddate><creator>Kong, Yongqiang</creator><creator>Wang, Yunhong</creator><creator>Li, Annan</creator><creator>Huang, Qiuyu</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-3497-5052</orcidid><orcidid>https://orcid.org/0000-0001-6793-2492</orcidid><orcidid>https://orcid.org/0000-0001-8001-2703</orcidid></search><sort><creationdate>2023</creationdate><title>Self-Sufficient Feature Enhancing Networks for Video Salient Object Detection</title><author>Kong, Yongqiang ; Wang, Yunhong ; Li, Annan ; Huang, Qiuyu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c291t-ce4e92d75003a5556a1133f5e113010e8cb5e635bac4fa6c58d5f06d339b15fb3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Artificial neural networks</topic><topic>Datasets</topic><topic>deep network</topic><topic>Estimation</topic><topic>feature enhancing module</topic><topic>Feature extraction</topic><topic>joint training</topic><topic>Machine learning</topic><topic>Object detection</topic><topic>Object recognition</topic><topic>Optical communication</topic><topic>Optical flow (image analysis)</topic><topic>Optical imaging</topic><topic>Salience</topic><topic>Self sufficiency</topic><topic>Spatiotemporal phenomena</topic><topic>State of the art</topic><topic>Task analysis</topic><topic>Video salient object detection</topic><topic>Visualization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kong, Yongqiang</creatorcontrib><creatorcontrib>Wang, Yunhong</creatorcontrib><creatorcontrib>Li, Annan</creatorcontrib><creatorcontrib>Huang, Qiuyu</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Xplore (IEEE/IET Electronic Library - IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on multimedia</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kong, Yongqiang</au><au>Wang, Yunhong</au><au>Li, Annan</au><au>Huang, Qiuyu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Self-Sufficient Feature Enhancing Networks for Video Salient Object Detection</atitle><jtitle>IEEE transactions on multimedia</jtitle><stitle>TMM</stitle><date>2023</date><risdate>2023</risdate><volume>25</volume><spage>557</spage><epage>571</epage><pages>557-571</pages><issn>1520-9210</issn><eissn>1941-0077</eissn><coden>ITMUF8</coden><abstract>Detecting salient objects in videos is a very challenging task. Current state-of-the-art methods are dominated by motion based deep neural networks, among which optical flow is often leveraged as motion representation. Though with robust performance, these optical flow-based video salient object detection methods face at least two problems that may hinder their generalization and application. First, computing optical flow as a pre-processing step does not support direct end-to-end learning; second, little attention has been given to the quality of visual features due to high computational cost of spatiotemporal feature encoding. In this paper we propose a novel self-sufficient feature enhancing network (SFENet) for video salient object detection, which leverages optical flow estimation as an auxiliary task while being end-to-end trainable. With a joint training scheme of both salient object detection and optical flow estimation, its multi-task architecture can be totally self-sufficient for achieving good performance without any pre-processing. Furthermore, for improving feature quality, we design four lightweight modules in spatial and temporal domains, including cross-layer fusion, multi-level warping, spatial-channel attention and boundary-aware refinement. The proposed method is evaluated through extensive experiments on five video salient object detection datasets. Experimental results show that our SFENet can be easily trained with fast convergence speed. It significantly outperforms previous methods in terms of various evaluation metrics. Moreover, with optical flow estimation and unsupervised video object segmentation as example applications, our method also yields state-of-the-art results on standard datasets.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/TMM.2021.3129052</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0003-3497-5052</orcidid><orcidid>https://orcid.org/0000-0001-6793-2492</orcidid><orcidid>https://orcid.org/0000-0001-8001-2703</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 1520-9210
ispartof	IEEE transactions on multimedia, 2023, Vol.25, p.557-571
issn	1520-9210 1941-0077
language	eng
recordid	cdi_proquest_journals_2774332835
source	IEEE Xplore (Online service)
subjects	Artificial neural networks Datasets deep network Estimation feature enhancing module Feature extraction joint training Machine learning Object detection Object recognition Optical communication Optical flow (image analysis) Optical imaging Salience Self sufficiency Spatiotemporal phenomena State of the art Task analysis Video salient object detection Visualization
title	Self-Sufficient Feature Enhancing Networks for Video Salient Object Detection
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T19%3A45%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Self-Sufficient%20Feature%20Enhancing%20Networks%20for%20Video%20Salient%20Object%20Detection&rft.jtitle=IEEE%20transactions%20on%20multimedia&rft.au=Kong,%20Yongqiang&rft.date=2023&rft.volume=25&rft.spage=557&rft.epage=571&rft.pages=557-571&rft.issn=1520-9210&rft.eissn=1941-0077&rft.coden=ITMUF8&rft_id=info:doi/10.1109/TMM.2021.3129052&rft_dat=%3Cproquest_cross%3E2774332835%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c291t-ce4e92d75003a5556a1133f5e113010e8cb5e635bac4fa6c58d5f06d339b15fb3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2774332835&rft_id=info:pmid/&rft_ieee_id=9619922&rfr_iscdi=true