Loading…
Learning Physical-Spatio-Temporal Features for Video Shadow Removal
Shadow removal in a single image has received increasing attention in recent years. However, removing shadows over dynamic scenes remains largely under-explored. In this paper, we propose the first data-driven video shadow removal model, termed PSTNet, by exploiting three essential characteristics o...
Saved in:
Published in: | IEEE transactions on circuits and systems for video technology 2024-07, Vol.34 (7), p.5830-5842 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | cdi_FETCH-LOGICAL-c291t-324b9673c3a1cde3b72b350a38ad1eec08c536310313908eeb2f854cf5c7bcc43 |
container_end_page | 5842 |
container_issue | 7 |
container_start_page | 5830 |
container_title | IEEE transactions on circuits and systems for video technology |
container_volume | 34 |
creator | Chen, Zhihao Wan, Liang Xiao, Yefan Zhu, Lei Fu, Huazhu |
description | Shadow removal in a single image has received increasing attention in recent years. However, removing shadows over dynamic scenes remains largely under-explored. In this paper, we propose the first data-driven video shadow removal model, termed PSTNet, by exploiting three essential characteristics of video shadows, i.e., physical property, spatio relation, and temporal coherence. Specifically, a dedicated physical branch was established to conduct local illumination estimation, which is more applicable for scenes with complex lighting and textures, and then enhance the physical features via a mask-guided attention strategy. Then, we develop a progressive aggregation module to enhance the spatio and temporal characteristics of features maps, and effectively integrate the three kinds of features. Furthermore, to tackle the lack of datasets of paired shadow videos, we synthesize a dataset (SVSRD-85) with aid of the popular game GTAV by controlling the switch of the shadow renderer. Experiments against 9 state-of-the-art models, including image shadow removers and image/video restoration methods, show that our method improves the best SOTA in terms of RMSE error for the shadow area by 14.7%. In addition, we develop a lightweight model adaptation strategy to make our synthetic-driven model effective in real world scenes. The visual comparison on the public SBU-TimeLapse dataset verifies the generalization ability of our model in real scenes. |
doi_str_mv | 10.1109/TCSVT.2024.3369910 |
format | article |
fullrecord | <record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_proquest_journals_3075427092</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10445327</ieee_id><sourcerecordid>3075427092</sourcerecordid><originalsourceid>FETCH-LOGICAL-c291t-324b9673c3a1cde3b72b350a38ad1eec08c536310313908eeb2f854cf5c7bcc43</originalsourceid><addsrcrecordid>eNpNkMtOwzAQRS0EEqXwA4hFJNYpY4_dJEsU8ZIqgWjo1nKcCU2V1sFOQf17UtoFqzuLe-ZKh7FrDhPOIbsr8vmimAgQcoI4zTIOJ2zElUpjIUCdDjcoHqeCq3N2EcIKgMtUJiOWz8j4TbP5jN6Wu9BY08bzzvSNiwtad86bNnok0289hah2Plo0FblovjSV-4neae2-TXvJzmrTBro65ph9PD4U-XM8e316ye9nsRUZ72MUssymCVo03FaEZSJKVGAwNRUnspBahVPkgBwzSIlKUadK2lrZpLRW4pjdHv523n1tKfR65bZ-M0xqhERJkUAmhpY4tKx3IXiqdeebtfE7zUHvZek_WXovSx9lDdDNAWqI6B8gpUKR4C8X7mV4</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3075427092</pqid></control><display><type>article</type><title>Learning Physical-Spatio-Temporal Features for Video Shadow Removal</title><source>IEEE Xplore (Online service)</source><creator>Chen, Zhihao ; Wan, Liang ; Xiao, Yefan ; Zhu, Lei ; Fu, Huazhu</creator><creatorcontrib>Chen, Zhihao ; Wan, Liang ; Xiao, Yefan ; Zhu, Lei ; Fu, Huazhu</creatorcontrib><description>Shadow removal in a single image has received increasing attention in recent years. However, removing shadows over dynamic scenes remains largely under-explored. In this paper, we propose the first data-driven video shadow removal model, termed PSTNet, by exploiting three essential characteristics of video shadows, i.e., physical property, spatio relation, and temporal coherence. Specifically, a dedicated physical branch was established to conduct local illumination estimation, which is more applicable for scenes with complex lighting and textures, and then enhance the physical features via a mask-guided attention strategy. Then, we develop a progressive aggregation module to enhance the spatio and temporal characteristics of features maps, and effectively integrate the three kinds of features. Furthermore, to tackle the lack of datasets of paired shadow videos, we synthesize a dataset (SVSRD-85) with aid of the popular game GTAV by controlling the switch of the shadow renderer. Experiments against 9 state-of-the-art models, including image shadow removers and image/video restoration methods, show that our method improves the best SOTA in terms of RMSE error for the shadow area by 14.7%. In addition, we develop a lightweight model adaptation strategy to make our synthetic-driven model effective in real world scenes. The visual comparison on the public SBU-TimeLapse dataset verifies the generalization ability of our model in real scenes.</description><identifier>ISSN: 1051-8215</identifier><identifier>EISSN: 1558-2205</identifier><identifier>DOI: 10.1109/TCSVT.2024.3369910</identifier><identifier>CODEN: ITCTEM</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Adaptation models ; Circuits and systems ; Datasets ; Effectiveness ; Feature extraction ; Games ; Illumination ; Image restoration ; Lighting ; physical-spatio-temporal features ; Root-mean-square errors ; Shadows ; Strategy ; synthetic scenes ; Task analysis ; Video shadow removal</subject><ispartof>IEEE transactions on circuits and systems for video technology, 2024-07, Vol.34 (7), p.5830-5842</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c291t-324b9673c3a1cde3b72b350a38ad1eec08c536310313908eeb2f854cf5c7bcc43</cites><orcidid>0000-0003-1686-9854 ; 0000-0001-5501-9575 ; 0000-0003-3871-663X ; 0000-0002-9702-5524</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10445327$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,777,781,27905,27906,54777</link.rule.ids></links><search><creatorcontrib>Chen, Zhihao</creatorcontrib><creatorcontrib>Wan, Liang</creatorcontrib><creatorcontrib>Xiao, Yefan</creatorcontrib><creatorcontrib>Zhu, Lei</creatorcontrib><creatorcontrib>Fu, Huazhu</creatorcontrib><title>Learning Physical-Spatio-Temporal Features for Video Shadow Removal</title><title>IEEE transactions on circuits and systems for video technology</title><addtitle>TCSVT</addtitle><description>Shadow removal in a single image has received increasing attention in recent years. However, removing shadows over dynamic scenes remains largely under-explored. In this paper, we propose the first data-driven video shadow removal model, termed PSTNet, by exploiting three essential characteristics of video shadows, i.e., physical property, spatio relation, and temporal coherence. Specifically, a dedicated physical branch was established to conduct local illumination estimation, which is more applicable for scenes with complex lighting and textures, and then enhance the physical features via a mask-guided attention strategy. Then, we develop a progressive aggregation module to enhance the spatio and temporal characteristics of features maps, and effectively integrate the three kinds of features. Furthermore, to tackle the lack of datasets of paired shadow videos, we synthesize a dataset (SVSRD-85) with aid of the popular game GTAV by controlling the switch of the shadow renderer. Experiments against 9 state-of-the-art models, including image shadow removers and image/video restoration methods, show that our method improves the best SOTA in terms of RMSE error for the shadow area by 14.7%. In addition, we develop a lightweight model adaptation strategy to make our synthetic-driven model effective in real world scenes. The visual comparison on the public SBU-TimeLapse dataset verifies the generalization ability of our model in real scenes.</description><subject>Adaptation models</subject><subject>Circuits and systems</subject><subject>Datasets</subject><subject>Effectiveness</subject><subject>Feature extraction</subject><subject>Games</subject><subject>Illumination</subject><subject>Image restoration</subject><subject>Lighting</subject><subject>physical-spatio-temporal features</subject><subject>Root-mean-square errors</subject><subject>Shadows</subject><subject>Strategy</subject><subject>synthetic scenes</subject><subject>Task analysis</subject><subject>Video shadow removal</subject><issn>1051-8215</issn><issn>1558-2205</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNpNkMtOwzAQRS0EEqXwA4hFJNYpY4_dJEsU8ZIqgWjo1nKcCU2V1sFOQf17UtoFqzuLe-ZKh7FrDhPOIbsr8vmimAgQcoI4zTIOJ2zElUpjIUCdDjcoHqeCq3N2EcIKgMtUJiOWz8j4TbP5jN6Wu9BY08bzzvSNiwtad86bNnok0289hah2Plo0FblovjSV-4neae2-TXvJzmrTBro65ph9PD4U-XM8e316ye9nsRUZ72MUssymCVo03FaEZSJKVGAwNRUnspBahVPkgBwzSIlKUadK2lrZpLRW4pjdHv523n1tKfR65bZ-M0xqhERJkUAmhpY4tKx3IXiqdeebtfE7zUHvZek_WXovSx9lDdDNAWqI6B8gpUKR4C8X7mV4</recordid><startdate>20240701</startdate><enddate>20240701</enddate><creator>Chen, Zhihao</creator><creator>Wan, Liang</creator><creator>Xiao, Yefan</creator><creator>Zhu, Lei</creator><creator>Fu, Huazhu</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0003-1686-9854</orcidid><orcidid>https://orcid.org/0000-0001-5501-9575</orcidid><orcidid>https://orcid.org/0000-0003-3871-663X</orcidid><orcidid>https://orcid.org/0000-0002-9702-5524</orcidid></search><sort><creationdate>20240701</creationdate><title>Learning Physical-Spatio-Temporal Features for Video Shadow Removal</title><author>Chen, Zhihao ; Wan, Liang ; Xiao, Yefan ; Zhu, Lei ; Fu, Huazhu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c291t-324b9673c3a1cde3b72b350a38ad1eec08c536310313908eeb2f854cf5c7bcc43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Adaptation models</topic><topic>Circuits and systems</topic><topic>Datasets</topic><topic>Effectiveness</topic><topic>Feature extraction</topic><topic>Games</topic><topic>Illumination</topic><topic>Image restoration</topic><topic>Lighting</topic><topic>physical-spatio-temporal features</topic><topic>Root-mean-square errors</topic><topic>Shadows</topic><topic>Strategy</topic><topic>synthetic scenes</topic><topic>Task analysis</topic><topic>Video shadow removal</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chen, Zhihao</creatorcontrib><creatorcontrib>Wan, Liang</creatorcontrib><creatorcontrib>Xiao, Yefan</creatorcontrib><creatorcontrib>Zhu, Lei</creatorcontrib><creatorcontrib>Fu, Huazhu</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library Online</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on circuits and systems for video technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chen, Zhihao</au><au>Wan, Liang</au><au>Xiao, Yefan</au><au>Zhu, Lei</au><au>Fu, Huazhu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Learning Physical-Spatio-Temporal Features for Video Shadow Removal</atitle><jtitle>IEEE transactions on circuits and systems for video technology</jtitle><stitle>TCSVT</stitle><date>2024-07-01</date><risdate>2024</risdate><volume>34</volume><issue>7</issue><spage>5830</spage><epage>5842</epage><pages>5830-5842</pages><issn>1051-8215</issn><eissn>1558-2205</eissn><coden>ITCTEM</coden><abstract>Shadow removal in a single image has received increasing attention in recent years. However, removing shadows over dynamic scenes remains largely under-explored. In this paper, we propose the first data-driven video shadow removal model, termed PSTNet, by exploiting three essential characteristics of video shadows, i.e., physical property, spatio relation, and temporal coherence. Specifically, a dedicated physical branch was established to conduct local illumination estimation, which is more applicable for scenes with complex lighting and textures, and then enhance the physical features via a mask-guided attention strategy. Then, we develop a progressive aggregation module to enhance the spatio and temporal characteristics of features maps, and effectively integrate the three kinds of features. Furthermore, to tackle the lack of datasets of paired shadow videos, we synthesize a dataset (SVSRD-85) with aid of the popular game GTAV by controlling the switch of the shadow renderer. Experiments against 9 state-of-the-art models, including image shadow removers and image/video restoration methods, show that our method improves the best SOTA in terms of RMSE error for the shadow area by 14.7%. In addition, we develop a lightweight model adaptation strategy to make our synthetic-driven model effective in real world scenes. The visual comparison on the public SBU-TimeLapse dataset verifies the generalization ability of our model in real scenes.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TCSVT.2024.3369910</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0003-1686-9854</orcidid><orcidid>https://orcid.org/0000-0001-5501-9575</orcidid><orcidid>https://orcid.org/0000-0003-3871-663X</orcidid><orcidid>https://orcid.org/0000-0002-9702-5524</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1051-8215 |
ispartof | IEEE transactions on circuits and systems for video technology, 2024-07, Vol.34 (7), p.5830-5842 |
issn | 1051-8215 1558-2205 |
language | eng |
recordid | cdi_proquest_journals_3075427092 |
source | IEEE Xplore (Online service) |
subjects | Adaptation models Circuits and systems Datasets Effectiveness Feature extraction Games Illumination Image restoration Lighting physical-spatio-temporal features Root-mean-square errors Shadows Strategy synthetic scenes Task analysis Video shadow removal |
title | Learning Physical-Spatio-Temporal Features for Video Shadow Removal |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T23%3A13%3A47IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Learning%20Physical-Spatio-Temporal%20Features%20for%20Video%20Shadow%20Removal&rft.jtitle=IEEE%20transactions%20on%20circuits%20and%20systems%20for%20video%20technology&rft.au=Chen,%20Zhihao&rft.date=2024-07-01&rft.volume=34&rft.issue=7&rft.spage=5830&rft.epage=5842&rft.pages=5830-5842&rft.issn=1051-8215&rft.eissn=1558-2205&rft.coden=ITCTEM&rft_id=info:doi/10.1109/TCSVT.2024.3369910&rft_dat=%3Cproquest_ieee_%3E3075427092%3C/proquest_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c291t-324b9673c3a1cde3b72b350a38ad1eec08c536310313908eeb2f854cf5c7bcc43%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3075427092&rft_id=info:pmid/&rft_ieee_id=10445327&rfr_iscdi=true |