Loading…

Lightweight LiDAR-Camera Alignment With Homogeneous Local-Global Aware Representation

In this paper, a novel LiDAR-Camera Alignment (LCA) method using homogeneous local-global spatial aware representation is proposed. Compared with the state-of-the-art methods (e.g., LCCNet), our proposition holds 2 main superiorities. First, homogeneous multi-modality representation learned with a u...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on intelligent transportation systems 2024-11, Vol.25 (11), p.15922-15933
Main Authors:	Zhu, Angfan, Xiao, Yang, Liu, Chengxin, Tan, Mingkui, Cao, Zhiguo
Format:	Article
Language:	English
Subjects:	6-DOF Cameras Convolutional neural networks deep learning Feature extraction homogeneous multi-modality representation Laser radar LiDAR-camera alignment local-global spatial awareness Representation learning transformer Transformers
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In this paper, a novel LiDAR-Camera Alignment (LCA) method using homogeneous local-global spatial aware representation is proposed. Compared with the state-of-the-art methods (e.g., LCCNet), our proposition holds 2 main superiorities. First, homogeneous multi-modality representation learned with a uniform CNN model is applied along the iterative prediction stages, instead of the state-of-the-art heterogeneous counterparts extracted from the separated modality-wise CNN models within each stage. In this way, the model size can be significantly decreased (e.g., 12.39M (ours) vs. 333.75M (LCCNet)). Meanwhile, within our proposition the interaction between LiDAR and camera data is built during feature learning to better exploit the descriptive clues, which has not been well concerned by the existing approaches. Secondly, we propose to equip the learned LCA representation with local-global spatial aware capacity via encoding CNN's local convolutional features with Transformer's non-local self-attention manner. Accordingly, the local fine details and global spatial context can be jointly captured by the encoded local features. And, they will be jointly used for LCA. On the other hand, the existing methods generally choose to reveal the global spatial property via intuitively concatenating the local features. Additionally at the initial LCA stage, LiDAR is roughly aligned with camera by our pre-alignment method, according to the point distribution characteristics of its 2D projection version with the initial extrinsic parameters. Although its structure is simple, it can essentially alleviate LCA's difficulty for the consequent stages. To better optimize LCA, a novel loss function that builds the correlation between translation and rotation loss items is also proposed. The experiments on KITTI data verifies the superiority of our proposition both on effectiveness and efficiency. The source code will be released at https://github.com/Zaf233/Light-weight-LCA upon acceptance.
ISSN:	1524-9050 1558-0016
DOI:	10.1109/TITS.2024.3409397