Loading…

Multi-Frame Self-Supervised Depth Estimation with Multi-Scale Feature Fusion in Dynamic Scenes

Multi-frame methods improve monocular depth estimation over single-frame approaches by aggregating spatial-temporal information via feature matching. However, the spatial-temporal feature leads to accuracy degradation in dynamic scenes. To enhance the performance, recent methods tend to propose comp...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2023-12
Main Authors:	Zhong, Jiquan, Huang, Xiaolin, Yu, Xiao
Format:	Article
Language:	English
Subjects:	Complexity Degradation Distillation Matching Network reliability Robustness
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Zhong, Jiquan Huang, Xiaolin Yu, Xiao
description	Multi-frame methods improve monocular depth estimation over single-frame approaches by aggregating spatial-temporal information via feature matching. However, the spatial-temporal feature leads to accuracy degradation in dynamic scenes. To enhance the performance, recent methods tend to propose complex architectures for feature matching and dynamic scenes. In this paper, we show that a simple learning framework, together with designed feature augmentation, leads to superior performance. (1) A novel dynamic objects detecting method with geometry explainability is proposed. The detected dynamic objects are excluded during training, which guarantees the static environment assumption and relieves the accuracy degradation problem of the multi-frame depth estimation. (2) Multi-scale feature fusion is proposed for feature matching in the multi-frame depth network, which improves feature matching, especially between frames with large camera motion. (3) The robust knowledge distillation with a robust teacher network and reliability guarantee is proposed, which improves the multi-frame depth estimation without computation complexity increase during the test. The experiments show that our proposed methods achieve great performance improvement on the multi-frame depth estimation.
doi_str_mv	10.48550/arxiv.2303.14628
format	article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2791771146</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2791771146</sourcerecordid><originalsourceid>FETCH-LOGICAL-a958-616e9e085aff8cc570950b027f5168b29182d8ddbec4b04a985f9bd75bc927213</originalsourceid><addsrcrecordid>eNotkE1LxDAYhIMguKz7A7wFPLcmb5omOcp-qLDioXt2SdO3mKVfNs2q_97KenoYhpmBIeSOszTTUrIHO377cwqCiZRnOegrsgAheKIzgBuyCuHEGINcgZRiQd5fYzP5ZDfaFmmBTZ0UccDx7ANWdIPD9EG3YfKtnXzf0S8_60uicLZBukM7xXFmDH--7-jmp7Otd7Rw2GG4Jde1bQKu_rkkh932sH5O9m9PL-vHfWKN1EnOczTItLR1rZ2TihnJSgaqljzXJRiuodJVVaLLSpZZo2VtykrJ0hlQwMWS3F9qh7H_jBim46mPYzcvHkEZrhSfrxC_7wBVFg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2791771146</pqid></control><display><type>article</type><title>Multi-Frame Self-Supervised Depth Estimation with Multi-Scale Feature Fusion in Dynamic Scenes</title><source>Publicly Available Content Database</source><creator>Zhong, Jiquan ; Huang, Xiaolin ; Yu, Xiao</creator><creatorcontrib>Zhong, Jiquan ; Huang, Xiaolin ; Yu, Xiao</creatorcontrib><description>Multi-frame methods improve monocular depth estimation over single-frame approaches by aggregating spatial-temporal information via feature matching. However, the spatial-temporal feature leads to accuracy degradation in dynamic scenes. To enhance the performance, recent methods tend to propose complex architectures for feature matching and dynamic scenes. In this paper, we show that a simple learning framework, together with designed feature augmentation, leads to superior performance. (1) A novel dynamic objects detecting method with geometry explainability is proposed. The detected dynamic objects are excluded during training, which guarantees the static environment assumption and relieves the accuracy degradation problem of the multi-frame depth estimation. (2) Multi-scale feature fusion is proposed for feature matching in the multi-frame depth network, which improves feature matching, especially between frames with large camera motion. (3) The robust knowledge distillation with a robust teacher network and reliability guarantee is proposed, which improves the multi-frame depth estimation without computation complexity increase during the test. The experiments show that our proposed methods achieve great performance improvement on the multi-frame depth estimation.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2303.14628</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Complexity ; Degradation ; Distillation ; Matching ; Network reliability ; Robustness</subject><ispartof>arXiv.org, 2023-12</ispartof><rights>2023. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2791771146?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>777,781,25734,27906,36993,44571</link.rule.ids></links><search><creatorcontrib>Zhong, Jiquan</creatorcontrib><creatorcontrib>Huang, Xiaolin</creatorcontrib><creatorcontrib>Yu, Xiao</creatorcontrib><title>Multi-Frame Self-Supervised Depth Estimation with Multi-Scale Feature Fusion in Dynamic Scenes</title><title>arXiv.org</title><description>Multi-frame methods improve monocular depth estimation over single-frame approaches by aggregating spatial-temporal information via feature matching. However, the spatial-temporal feature leads to accuracy degradation in dynamic scenes. To enhance the performance, recent methods tend to propose complex architectures for feature matching and dynamic scenes. In this paper, we show that a simple learning framework, together with designed feature augmentation, leads to superior performance. (1) A novel dynamic objects detecting method with geometry explainability is proposed. The detected dynamic objects are excluded during training, which guarantees the static environment assumption and relieves the accuracy degradation problem of the multi-frame depth estimation. (2) Multi-scale feature fusion is proposed for feature matching in the multi-frame depth network, which improves feature matching, especially between frames with large camera motion. (3) The robust knowledge distillation with a robust teacher network and reliability guarantee is proposed, which improves the multi-frame depth estimation without computation complexity increase during the test. The experiments show that our proposed methods achieve great performance improvement on the multi-frame depth estimation.</description><subject>Complexity</subject><subject>Degradation</subject><subject>Distillation</subject><subject>Matching</subject><subject>Network reliability</subject><subject>Robustness</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNotkE1LxDAYhIMguKz7A7wFPLcmb5omOcp-qLDioXt2SdO3mKVfNs2q_97KenoYhpmBIeSOszTTUrIHO377cwqCiZRnOegrsgAheKIzgBuyCuHEGINcgZRiQd5fYzP5ZDfaFmmBTZ0UccDx7ANWdIPD9EG3YfKtnXzf0S8_60uicLZBukM7xXFmDH--7-jmp7Otd7Rw2GG4Jde1bQKu_rkkh932sH5O9m9PL-vHfWKN1EnOczTItLR1rZ2TihnJSgaqljzXJRiuodJVVaLLSpZZo2VtykrJ0hlQwMWS3F9qh7H_jBim46mPYzcvHkEZrhSfrxC_7wBVFg</recordid><startdate>20231219</startdate><enddate>20231219</enddate><creator>Zhong, Jiquan</creator><creator>Huang, Xiaolin</creator><creator>Yu, Xiao</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20231219</creationdate><title>Multi-Frame Self-Supervised Depth Estimation with Multi-Scale Feature Fusion in Dynamic Scenes</title><author>Zhong, Jiquan ; Huang, Xiaolin ; Yu, Xiao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a958-616e9e085aff8cc570950b027f5168b29182d8ddbec4b04a985f9bd75bc927213</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Complexity</topic><topic>Degradation</topic><topic>Distillation</topic><topic>Matching</topic><topic>Network reliability</topic><topic>Robustness</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhong, Jiquan</creatorcontrib><creatorcontrib>Huang, Xiaolin</creatorcontrib><creatorcontrib>Yu, Xiao</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhong, Jiquan</au><au>Huang, Xiaolin</au><au>Yu, Xiao</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multi-Frame Self-Supervised Depth Estimation with Multi-Scale Feature Fusion in Dynamic Scenes</atitle><jtitle>arXiv.org</jtitle><date>2023-12-19</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>Multi-frame methods improve monocular depth estimation over single-frame approaches by aggregating spatial-temporal information via feature matching. However, the spatial-temporal feature leads to accuracy degradation in dynamic scenes. To enhance the performance, recent methods tend to propose complex architectures for feature matching and dynamic scenes. In this paper, we show that a simple learning framework, together with designed feature augmentation, leads to superior performance. (1) A novel dynamic objects detecting method with geometry explainability is proposed. The detected dynamic objects are excluded during training, which guarantees the static environment assumption and relieves the accuracy degradation problem of the multi-frame depth estimation. (2) Multi-scale feature fusion is proposed for feature matching in the multi-frame depth network, which improves feature matching, especially between frames with large camera motion. (3) The robust knowledge distillation with a robust teacher network and reliability guarantee is proposed, which improves the multi-frame depth estimation without computation complexity increase during the test. The experiments show that our proposed methods achieve great performance improvement on the multi-frame depth estimation.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2303.14628</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2023-12
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2791771146
source	Publicly Available Content Database
subjects	Complexity Degradation Distillation Matching Network reliability Robustness
title	Multi-Frame Self-Supervised Depth Estimation with Multi-Scale Feature Fusion in Dynamic Scenes
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-19T12%3A29%3A41IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multi-Frame%20Self-Supervised%20Depth%20Estimation%20with%20Multi-Scale%20Feature%20Fusion%20in%20Dynamic%20Scenes&rft.jtitle=arXiv.org&rft.au=Zhong,%20Jiquan&rft.date=2023-12-19&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2303.14628&rft_dat=%3Cproquest%3E2791771146%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-a958-616e9e085aff8cc570950b027f5168b29182d8ddbec4b04a985f9bd75bc927213%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2791771146&rft_id=info:pmid/&rfr_iscdi=true