Loading…

DVT: Decoupled Dual-Branch View Transformation for Monocular Bird's Eye View Semantic Segmentation

Monocular Bird's Eye View (BEV) semantic segmentation is critical for autonomous driving for its inherent advantages in spatial representation and downstream tasks. However, it is challenging to simultaneously learn view transformation and pixel-wise classification. Previous works suffer from n...

Full description

Saved in:
Bibliographic Details
Main Authors: Du, Jiayuan, Pan, Xianghui, Shen, Mengjiao, Su, Shuai, Yang, Jingwei, Liu, Chengju, Chen, Qijun
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Monocular Bird's Eye View (BEV) semantic segmentation is critical for autonomous driving for its inherent advantages in spatial representation and downstream tasks. However, it is challenging to simultaneously learn view transformation and pixel-wise classification. Previous works suffer from non-flat region distortion, distant depth ambiguity, and visual occlusion. To address these aforementioned concerns, we propose dual-branch view transformation (DVT), a novel framework for monocular BEV semantic segmentation. Our method consists of: (i) A dual-branch view transformation to decouple features into flat region and non-flat region and process them independently. (ii) A depth-aware weighting method to make the model pay more attention to the distant depth. (iii) An auxiliary task to introduce more inductive biases to alleviate the inaccuracy caused by visual occlusion. Furthermore, we design a class-aware weighting method to address the class and size imbalance of datasets. Experimental results on nuScenes and KITTI-360 datasets demonstrate that DVT outperforms previous state-of-the-art (SOTA). Our codes are available at https://github.com/MrPicklesGG/DVT.
ISSN:2153-0866
DOI:10.1109/IROS58592.2024.10802126