Loading…

2D-3D Cross-Modality Network for End-to-End Localization with Probabilistic Supervision

Accurate localization ability is a crucial component for autonomous robots. Given existing LiDAR 3D points maps, it is cost-effective to localize the robot only with onboard camera compared to LiDAR. However, matching 2D visual information with 3D point cloud maps presents huge challenges due to dif...

Full description

Saved in:
Bibliographic Details
Main Authors: Pan, Jin, Mu, Xiangru, Qin, Tong, Xu, Chunjing, Yang, Ming
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 912
container_issue
container_start_page 906
container_title
container_volume
creator Pan, Jin
Mu, Xiangru
Qin, Tong
Xu, Chunjing
Yang, Ming
description Accurate localization ability is a crucial component for autonomous robots. Given existing LiDAR 3D points maps, it is cost-effective to localize the robot only with onboard camera compared to LiDAR. However, matching 2D visual information with 3D point cloud maps presents huge challenges due to different modalities, dimensions, noise and occlusion issues. To overcome it, we propose an end-to-end neural network-based solution, which determines the 6-DoF pose of the camera relative to an existing LiDAR map with centimeter accuracy. Given a query image, a pre-acquired point cloud and an initial pose, the cross-modality network will output a precise pose. By projecting the 3D point cloud onto the image plane, a depth image is acquired as seen from the initial pose. Subsequently, a cross-modality flow network establishes the correspondences of 2D pixels and projected points. Importantly, we leverage a robust probabilistic Perspective-n-Point (PnP) module, which are capable of fine-tuning 2D pairs and learning the pairs weight in an end-to-end manner. A comprehensive evaluation of our proposed algorithm is conducted in KITTI datasets. Furthermore, deploying the algorithm on the real-world parking lot scenario validates its strong practicality of the proposed algorithm. We highlight that this research offers a cost-effective and highly accurate solution that can be readily deployed in low-cost commercial vehicles.
doi_str_mv 10.1109/IV55156.2024.10588575
format conference_proceeding
fullrecord <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_10588575</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10588575</ieee_id><sourcerecordid>10588575</sourcerecordid><originalsourceid>FETCH-LOGICAL-i106t-a128acf4b92f93899dd05d16323d121c50c9dc323f036850e87d8932a422dec43</originalsourceid><addsrcrecordid>eNo1kMtKQzEURaMgWGv_QCE_kJpzktybDKX1UagP8DUsaZKL0dqUJFrq13tBHS02C9ZgE3IKfAzAzdnsWSlQzRg5yjFwpbVq1R4ZmdZoobiQWgPskwE2ElmLIA_JUSlvnCuFCAPyglMmpnSSUynsJnm7inVHb0PdpvxOu5TpxdqzmlgPOk-u99-2xrSm21hf6X1OS7uMq1hqdPThcxPyVyy9PiYHnV2VMPrjkDxdXjxOrtn87mo2OZ-zCLypzAJq6zq5NNgZoY3xnisPjUDhAcEp7ox3_eq4aLTiQbdeG4FWIvrgpBiSk99uDCEsNjl-2Lxb_B8hfgBSX1Ge</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>2D-3D Cross-Modality Network for End-to-End Localization with Probabilistic Supervision</title><source>IEEE Xplore All Conference Series</source><creator>Pan, Jin ; Mu, Xiangru ; Qin, Tong ; Xu, Chunjing ; Yang, Ming</creator><creatorcontrib>Pan, Jin ; Mu, Xiangru ; Qin, Tong ; Xu, Chunjing ; Yang, Ming</creatorcontrib><description>Accurate localization ability is a crucial component for autonomous robots. Given existing LiDAR 3D points maps, it is cost-effective to localize the robot only with onboard camera compared to LiDAR. However, matching 2D visual information with 3D point cloud maps presents huge challenges due to different modalities, dimensions, noise and occlusion issues. To overcome it, we propose an end-to-end neural network-based solution, which determines the 6-DoF pose of the camera relative to an existing LiDAR map with centimeter accuracy. Given a query image, a pre-acquired point cloud and an initial pose, the cross-modality network will output a precise pose. By projecting the 3D point cloud onto the image plane, a depth image is acquired as seen from the initial pose. Subsequently, a cross-modality flow network establishes the correspondences of 2D pixels and projected points. Importantly, we leverage a robust probabilistic Perspective-n-Point (PnP) module, which are capable of fine-tuning 2D pairs and learning the pairs weight in an end-to-end manner. A comprehensive evaluation of our proposed algorithm is conducted in KITTI datasets. Furthermore, deploying the algorithm on the real-world parking lot scenario validates its strong practicality of the proposed algorithm. We highlight that this research offers a cost-effective and highly accurate solution that can be readily deployed in low-cost commercial vehicles.</description><identifier>EISSN: 2642-7214</identifier><identifier>EISBN: 9798350348811</identifier><identifier>DOI: 10.1109/IV55156.2024.10588575</identifier><language>eng</language><publisher>IEEE</publisher><subject>Accuracy ; Laser radar ; Location awareness ; Point cloud compression ; Robot vision systems ; Three-dimensional displays ; Visualization</subject><ispartof>2024 IEEE Intelligent Vehicles Symposium (IV), 2024, p.906-912</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10588575$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,777,781,786,787,27906,54536,54913</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10588575$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Pan, Jin</creatorcontrib><creatorcontrib>Mu, Xiangru</creatorcontrib><creatorcontrib>Qin, Tong</creatorcontrib><creatorcontrib>Xu, Chunjing</creatorcontrib><creatorcontrib>Yang, Ming</creatorcontrib><title>2D-3D Cross-Modality Network for End-to-End Localization with Probabilistic Supervision</title><title>2024 IEEE Intelligent Vehicles Symposium (IV)</title><addtitle>IV</addtitle><description>Accurate localization ability is a crucial component for autonomous robots. Given existing LiDAR 3D points maps, it is cost-effective to localize the robot only with onboard camera compared to LiDAR. However, matching 2D visual information with 3D point cloud maps presents huge challenges due to different modalities, dimensions, noise and occlusion issues. To overcome it, we propose an end-to-end neural network-based solution, which determines the 6-DoF pose of the camera relative to an existing LiDAR map with centimeter accuracy. Given a query image, a pre-acquired point cloud and an initial pose, the cross-modality network will output a precise pose. By projecting the 3D point cloud onto the image plane, a depth image is acquired as seen from the initial pose. Subsequently, a cross-modality flow network establishes the correspondences of 2D pixels and projected points. Importantly, we leverage a robust probabilistic Perspective-n-Point (PnP) module, which are capable of fine-tuning 2D pairs and learning the pairs weight in an end-to-end manner. A comprehensive evaluation of our proposed algorithm is conducted in KITTI datasets. Furthermore, deploying the algorithm on the real-world parking lot scenario validates its strong practicality of the proposed algorithm. We highlight that this research offers a cost-effective and highly accurate solution that can be readily deployed in low-cost commercial vehicles.</description><subject>Accuracy</subject><subject>Laser radar</subject><subject>Location awareness</subject><subject>Point cloud compression</subject><subject>Robot vision systems</subject><subject>Three-dimensional displays</subject><subject>Visualization</subject><issn>2642-7214</issn><isbn>9798350348811</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2024</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNo1kMtKQzEURaMgWGv_QCE_kJpzktybDKX1UagP8DUsaZKL0dqUJFrq13tBHS02C9ZgE3IKfAzAzdnsWSlQzRg5yjFwpbVq1R4ZmdZoobiQWgPskwE2ElmLIA_JUSlvnCuFCAPyglMmpnSSUynsJnm7inVHb0PdpvxOu5TpxdqzmlgPOk-u99-2xrSm21hf6X1OS7uMq1hqdPThcxPyVyy9PiYHnV2VMPrjkDxdXjxOrtn87mo2OZ-zCLypzAJq6zq5NNgZoY3xnisPjUDhAcEp7ox3_eq4aLTiQbdeG4FWIvrgpBiSk99uDCEsNjl-2Lxb_B8hfgBSX1Ge</recordid><startdate>20240602</startdate><enddate>20240602</enddate><creator>Pan, Jin</creator><creator>Mu, Xiangru</creator><creator>Qin, Tong</creator><creator>Xu, Chunjing</creator><creator>Yang, Ming</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>20240602</creationdate><title>2D-3D Cross-Modality Network for End-to-End Localization with Probabilistic Supervision</title><author>Pan, Jin ; Mu, Xiangru ; Qin, Tong ; Xu, Chunjing ; Yang, Ming</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i106t-a128acf4b92f93899dd05d16323d121c50c9dc323f036850e87d8932a422dec43</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Laser radar</topic><topic>Location awareness</topic><topic>Point cloud compression</topic><topic>Robot vision systems</topic><topic>Three-dimensional displays</topic><topic>Visualization</topic><toplevel>online_resources</toplevel><creatorcontrib>Pan, Jin</creatorcontrib><creatorcontrib>Mu, Xiangru</creatorcontrib><creatorcontrib>Qin, Tong</creatorcontrib><creatorcontrib>Xu, Chunjing</creatorcontrib><creatorcontrib>Yang, Ming</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Xplore (Online service)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Pan, Jin</au><au>Mu, Xiangru</au><au>Qin, Tong</au><au>Xu, Chunjing</au><au>Yang, Ming</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>2D-3D Cross-Modality Network for End-to-End Localization with Probabilistic Supervision</atitle><btitle>2024 IEEE Intelligent Vehicles Symposium (IV)</btitle><stitle>IV</stitle><date>2024-06-02</date><risdate>2024</risdate><spage>906</spage><epage>912</epage><pages>906-912</pages><eissn>2642-7214</eissn><eisbn>9798350348811</eisbn><abstract>Accurate localization ability is a crucial component for autonomous robots. Given existing LiDAR 3D points maps, it is cost-effective to localize the robot only with onboard camera compared to LiDAR. However, matching 2D visual information with 3D point cloud maps presents huge challenges due to different modalities, dimensions, noise and occlusion issues. To overcome it, we propose an end-to-end neural network-based solution, which determines the 6-DoF pose of the camera relative to an existing LiDAR map with centimeter accuracy. Given a query image, a pre-acquired point cloud and an initial pose, the cross-modality network will output a precise pose. By projecting the 3D point cloud onto the image plane, a depth image is acquired as seen from the initial pose. Subsequently, a cross-modality flow network establishes the correspondences of 2D pixels and projected points. Importantly, we leverage a robust probabilistic Perspective-n-Point (PnP) module, which are capable of fine-tuning 2D pairs and learning the pairs weight in an end-to-end manner. A comprehensive evaluation of our proposed algorithm is conducted in KITTI datasets. Furthermore, deploying the algorithm on the real-world parking lot scenario validates its strong practicality of the proposed algorithm. We highlight that this research offers a cost-effective and highly accurate solution that can be readily deployed in low-cost commercial vehicles.</abstract><pub>IEEE</pub><doi>10.1109/IV55156.2024.10588575</doi><tpages>7</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier EISSN: 2642-7214
ispartof 2024 IEEE Intelligent Vehicles Symposium (IV), 2024, p.906-912
issn 2642-7214
language eng
recordid cdi_ieee_primary_10588575
source IEEE Xplore All Conference Series
subjects Accuracy
Laser radar
Location awareness
Point cloud compression
Robot vision systems
Three-dimensional displays
Visualization
title 2D-3D Cross-Modality Network for End-to-End Localization with Probabilistic Supervision
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-19T12%3A43%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=2D-3D%20Cross-Modality%20Network%20for%20End-to-End%20Localization%20with%20Probabilistic%20Supervision&rft.btitle=2024%20IEEE%20Intelligent%20Vehicles%20Symposium%20(IV)&rft.au=Pan,%20Jin&rft.date=2024-06-02&rft.spage=906&rft.epage=912&rft.pages=906-912&rft.eissn=2642-7214&rft_id=info:doi/10.1109/IV55156.2024.10588575&rft.eisbn=9798350348811&rft_dat=%3Cieee_CHZPO%3E10588575%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i106t-a128acf4b92f93899dd05d16323d121c50c9dc323f036850e87d8932a422dec43%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10588575&rfr_iscdi=true