Loading…
Self-Supervised Monocular Depth Estimation for All-Day Images Based on Dual-Axis Transformer
All-day self-supervised monocular depth estimation has strong practical significance for autonomous systems to continuously perceive the 3D information of the world. However, night-time scenes pose challenges of weak texture and violating the brightness consistency assumption due to low illumination...
Saved in:
Published in: | IEEE transactions on circuits and systems for video technology 2024-10, Vol.34 (10), p.9939-9953 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | All-day self-supervised monocular depth estimation has strong practical significance for autonomous systems to continuously perceive the 3D information of the world. However, night-time scenes pose challenges of weak texture and violating the brightness consistency assumption due to low illumination and varying lighting, respectively, which easily leads to most existing self-supervised models only being able to handle day-time scenes. To address this problem, we propose a self-supervised monocular depth estimation unified framework that can handle all-day scenarios, which has three features: 1) an Illumination Compensation PoseNet (ICP) is designed, which is based on the classic Phong illumination theory and compensates for lighting changes in adjacent frames by estimating per-pixel transformations; 2) a Dual-Axis Transformer (DAT) block is proposed as the backbone network of the depth encoder, which infers the depth of local low-illumination areas through spatial-channel dual-dimensional global context information of night-time images; 3) a cross-layer Adaptive Fusion Module (AFM) is introduced between multiple DAT blocks, which learns attention weights between different layer features and adaptively fuses cross-layer features using the learned weights, enhancing the complementarity of different layer features. This work was evaluated on multiple datasets, including: RobotCar, Waymo and KITTI datasets, achieving state-of-the-art results in both day-time and night-time scenarios. |
---|---|
ISSN: | 1051-8215 1558-2205 |
DOI: | 10.1109/TCSVT.2024.3406043 |