Loading…

Self-Supervised Monocular Depth Estimation via Binocular Geometric Correlation Learning

Monocular depth estimation aims to infer a depth map from a single image. Although supervised learning-based methods have achieved remarkable performance, they generally rely on a large amount of labor-intensively annotated data. Self-supervised methods, on the other hand, do not require any annotat...

Full description

Saved in:
Bibliographic Details
Published in:ACM transactions on multimedia computing communications and applications 2024-08, Vol.20 (8), p.1-19, Article 250
Main Authors: Peng, Bo, Sun, Lin, Lei, Jianjun, Liu, Bingzheng, Shen, Haifeng, Li, Wanqing, Huang, Qingming
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-a174t-34a10b91c9ce8ff31828a1278960487362660f89678817055dcb8784287fc35c3
cites cdi_FETCH-LOGICAL-a174t-34a10b91c9ce8ff31828a1278960487362660f89678817055dcb8784287fc35c3
container_end_page 19
container_issue 8
container_start_page 1
container_title ACM transactions on multimedia computing communications and applications
container_volume 20
creator Peng, Bo
Sun, Lin
Lei, Jianjun
Liu, Bingzheng
Shen, Haifeng
Li, Wanqing
Huang, Qingming
description Monocular depth estimation aims to infer a depth map from a single image. Although supervised learning-based methods have achieved remarkable performance, they generally rely on a large amount of labor-intensively annotated data. Self-supervised methods, on the other hand, do not require any annotation of ground-truth depth and have recently attracted increasing attention. In this work, we propose a self-supervised monocular depth estimation network via binocular geometric correlation learning. Specifically, considering the inter-view geometric correlation, a binocular cue prediction module is presented to generate the auxiliary vision cue for the self-supervised learning of monocular depth estimation. Then, to deal with the occlusion in depth estimation, an occlusion interference attenuated constraint is developed to guide the supervision of the network by inferring the occlusion region and producing paired occlusion masks. Experimental results on two popular benchmark datasets have demonstrated that the proposed network obtains competitive results compared to state-of-the-art self-supervised methods and achieves comparable results to some popular supervised methods.
doi_str_mv 10.1145/3663570
format article
fullrecord <record><control><sourceid>acm_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3663570</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3663570</sourcerecordid><originalsourceid>FETCH-LOGICAL-a174t-34a10b91c9ce8ff31828a1278960487362660f89678817055dcb8784287fc35c3</originalsourceid><addsrcrecordid>eNo9kL1PwzAUxC0EEqUgdiZvTAG_-DMjlFKQghgKYoxc1wajJI7stFL_e4LSdnr3dD-dTofQNZA7AMbvqRCUS3KCJsA5ZEIJfnrUXJ6ji5R-CaGCMzFBX0tbu2y56Wzc-mTX-C20wWxqHfGT7fofPE-9b3TvQ4u3XuNHf7AXNjS2j97gWYjR1iNTWh1b335fojOn62Sv9neKPp_nH7OXrHxfvM4eykyDZH1GmQayKsAUxirnKKhcacilKgRhSlKRC0Hc8EmlQBLO12alpGK5ks5QbugU3Y65JoaUonVVF4e-cVcBqf73qPZ7DOTNSGrTHKGD-QflBlm1</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Self-Supervised Monocular Depth Estimation via Binocular Geometric Correlation Learning</title><source>Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list)</source><creator>Peng, Bo ; Sun, Lin ; Lei, Jianjun ; Liu, Bingzheng ; Shen, Haifeng ; Li, Wanqing ; Huang, Qingming</creator><creatorcontrib>Peng, Bo ; Sun, Lin ; Lei, Jianjun ; Liu, Bingzheng ; Shen, Haifeng ; Li, Wanqing ; Huang, Qingming</creatorcontrib><description>Monocular depth estimation aims to infer a depth map from a single image. Although supervised learning-based methods have achieved remarkable performance, they generally rely on a large amount of labor-intensively annotated data. Self-supervised methods, on the other hand, do not require any annotation of ground-truth depth and have recently attracted increasing attention. In this work, we propose a self-supervised monocular depth estimation network via binocular geometric correlation learning. Specifically, considering the inter-view geometric correlation, a binocular cue prediction module is presented to generate the auxiliary vision cue for the self-supervised learning of monocular depth estimation. Then, to deal with the occlusion in depth estimation, an occlusion interference attenuated constraint is developed to guide the supervision of the network by inferring the occlusion region and producing paired occlusion masks. Experimental results on two popular benchmark datasets have demonstrated that the proposed network obtains competitive results compared to state-of-the-art self-supervised methods and achieves comparable results to some popular supervised methods.</description><identifier>ISSN: 1551-6857</identifier><identifier>EISSN: 1551-6865</identifier><identifier>DOI: 10.1145/3663570</identifier><language>eng</language><publisher>New York, NY: ACM</publisher><subject>Information systems ; Multimedia information systems</subject><ispartof>ACM transactions on multimedia computing communications and applications, 2024-08, Vol.20 (8), p.1-19, Article 250</ispartof><rights>Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a174t-34a10b91c9ce8ff31828a1278960487362660f89678817055dcb8784287fc35c3</citedby><cites>FETCH-LOGICAL-a174t-34a10b91c9ce8ff31828a1278960487362660f89678817055dcb8784287fc35c3</cites><orcidid>0000-0003-3171-7680 ; 0000-0002-6949-4147 ; 0000-0002-2654-3084 ; 0000-0001-7542-296X ; 0000-0002-6616-453X ; 0009-0002-9794-4915 ; 0000-0002-4427-2687</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Peng, Bo</creatorcontrib><creatorcontrib>Sun, Lin</creatorcontrib><creatorcontrib>Lei, Jianjun</creatorcontrib><creatorcontrib>Liu, Bingzheng</creatorcontrib><creatorcontrib>Shen, Haifeng</creatorcontrib><creatorcontrib>Li, Wanqing</creatorcontrib><creatorcontrib>Huang, Qingming</creatorcontrib><title>Self-Supervised Monocular Depth Estimation via Binocular Geometric Correlation Learning</title><title>ACM transactions on multimedia computing communications and applications</title><addtitle>ACM TOMM</addtitle><description>Monocular depth estimation aims to infer a depth map from a single image. Although supervised learning-based methods have achieved remarkable performance, they generally rely on a large amount of labor-intensively annotated data. Self-supervised methods, on the other hand, do not require any annotation of ground-truth depth and have recently attracted increasing attention. In this work, we propose a self-supervised monocular depth estimation network via binocular geometric correlation learning. Specifically, considering the inter-view geometric correlation, a binocular cue prediction module is presented to generate the auxiliary vision cue for the self-supervised learning of monocular depth estimation. Then, to deal with the occlusion in depth estimation, an occlusion interference attenuated constraint is developed to guide the supervision of the network by inferring the occlusion region and producing paired occlusion masks. Experimental results on two popular benchmark datasets have demonstrated that the proposed network obtains competitive results compared to state-of-the-art self-supervised methods and achieves comparable results to some popular supervised methods.</description><subject>Information systems</subject><subject>Multimedia information systems</subject><issn>1551-6857</issn><issn>1551-6865</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNo9kL1PwzAUxC0EEqUgdiZvTAG_-DMjlFKQghgKYoxc1wajJI7stFL_e4LSdnr3dD-dTofQNZA7AMbvqRCUS3KCJsA5ZEIJfnrUXJ6ji5R-CaGCMzFBX0tbu2y56Wzc-mTX-C20wWxqHfGT7fofPE-9b3TvQ4u3XuNHf7AXNjS2j97gWYjR1iNTWh1b335fojOn62Sv9neKPp_nH7OXrHxfvM4eykyDZH1GmQayKsAUxirnKKhcacilKgRhSlKRC0Hc8EmlQBLO12alpGK5ks5QbugU3Y65JoaUonVVF4e-cVcBqf73qPZ7DOTNSGrTHKGD-QflBlm1</recordid><startdate>20240831</startdate><enddate>20240831</enddate><creator>Peng, Bo</creator><creator>Sun, Lin</creator><creator>Lei, Jianjun</creator><creator>Liu, Bingzheng</creator><creator>Shen, Haifeng</creator><creator>Li, Wanqing</creator><creator>Huang, Qingming</creator><general>ACM</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0003-3171-7680</orcidid><orcidid>https://orcid.org/0000-0002-6949-4147</orcidid><orcidid>https://orcid.org/0000-0002-2654-3084</orcidid><orcidid>https://orcid.org/0000-0001-7542-296X</orcidid><orcidid>https://orcid.org/0000-0002-6616-453X</orcidid><orcidid>https://orcid.org/0009-0002-9794-4915</orcidid><orcidid>https://orcid.org/0000-0002-4427-2687</orcidid></search><sort><creationdate>20240831</creationdate><title>Self-Supervised Monocular Depth Estimation via Binocular Geometric Correlation Learning</title><author>Peng, Bo ; Sun, Lin ; Lei, Jianjun ; Liu, Bingzheng ; Shen, Haifeng ; Li, Wanqing ; Huang, Qingming</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a174t-34a10b91c9ce8ff31828a1278960487362660f89678817055dcb8784287fc35c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Information systems</topic><topic>Multimedia information systems</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Peng, Bo</creatorcontrib><creatorcontrib>Sun, Lin</creatorcontrib><creatorcontrib>Lei, Jianjun</creatorcontrib><creatorcontrib>Liu, Bingzheng</creatorcontrib><creatorcontrib>Shen, Haifeng</creatorcontrib><creatorcontrib>Li, Wanqing</creatorcontrib><creatorcontrib>Huang, Qingming</creatorcontrib><collection>CrossRef</collection><jtitle>ACM transactions on multimedia computing communications and applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Peng, Bo</au><au>Sun, Lin</au><au>Lei, Jianjun</au><au>Liu, Bingzheng</au><au>Shen, Haifeng</au><au>Li, Wanqing</au><au>Huang, Qingming</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Self-Supervised Monocular Depth Estimation via Binocular Geometric Correlation Learning</atitle><jtitle>ACM transactions on multimedia computing communications and applications</jtitle><stitle>ACM TOMM</stitle><date>2024-08-31</date><risdate>2024</risdate><volume>20</volume><issue>8</issue><spage>1</spage><epage>19</epage><pages>1-19</pages><artnum>250</artnum><issn>1551-6857</issn><eissn>1551-6865</eissn><abstract>Monocular depth estimation aims to infer a depth map from a single image. Although supervised learning-based methods have achieved remarkable performance, they generally rely on a large amount of labor-intensively annotated data. Self-supervised methods, on the other hand, do not require any annotation of ground-truth depth and have recently attracted increasing attention. In this work, we propose a self-supervised monocular depth estimation network via binocular geometric correlation learning. Specifically, considering the inter-view geometric correlation, a binocular cue prediction module is presented to generate the auxiliary vision cue for the self-supervised learning of monocular depth estimation. Then, to deal with the occlusion in depth estimation, an occlusion interference attenuated constraint is developed to guide the supervision of the network by inferring the occlusion region and producing paired occlusion masks. Experimental results on two popular benchmark datasets have demonstrated that the proposed network obtains competitive results compared to state-of-the-art self-supervised methods and achieves comparable results to some popular supervised methods.</abstract><cop>New York, NY</cop><pub>ACM</pub><doi>10.1145/3663570</doi><tpages>19</tpages><orcidid>https://orcid.org/0000-0003-3171-7680</orcidid><orcidid>https://orcid.org/0000-0002-6949-4147</orcidid><orcidid>https://orcid.org/0000-0002-2654-3084</orcidid><orcidid>https://orcid.org/0000-0001-7542-296X</orcidid><orcidid>https://orcid.org/0000-0002-6616-453X</orcidid><orcidid>https://orcid.org/0009-0002-9794-4915</orcidid><orcidid>https://orcid.org/0000-0002-4427-2687</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1551-6857
ispartof ACM transactions on multimedia computing communications and applications, 2024-08, Vol.20 (8), p.1-19, Article 250
issn 1551-6857
1551-6865
language eng
recordid cdi_crossref_primary_10_1145_3663570
source Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list)
subjects Information systems
Multimedia information systems
title Self-Supervised Monocular Depth Estimation via Binocular Geometric Correlation Learning
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T17%3A09%3A58IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Self-Supervised%20Monocular%20Depth%20Estimation%20via%20Binocular%20Geometric%20Correlation%20Learning&rft.jtitle=ACM%20transactions%20on%20multimedia%20computing%20communications%20and%20applications&rft.au=Peng,%20Bo&rft.date=2024-08-31&rft.volume=20&rft.issue=8&rft.spage=1&rft.epage=19&rft.pages=1-19&rft.artnum=250&rft.issn=1551-6857&rft.eissn=1551-6865&rft_id=info:doi/10.1145/3663570&rft_dat=%3Cacm_cross%3E3663570%3C/acm_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-a174t-34a10b91c9ce8ff31828a1278960487362660f89678817055dcb8784287fc35c3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true