Loading…

2D-3D Cross-Modality Network for End-to-End Localization with Probabilistic Supervision

Accurate localization ability is a crucial component for autonomous robots. Given existing LiDAR 3D points maps, it is cost-effective to localize the robot only with onboard camera compared to LiDAR. However, matching 2D visual information with 3D point cloud maps presents huge challenges due to dif...

Full description

Saved in:

Bibliographic Details
Main Authors:	Pan, Jin, Mu, Xiangru, Qin, Tong, Xu, Chunjing, Yang, Ming
Format:	Conference Proceeding
Language:	English
Subjects:	Accuracy Laser radar Location awareness Point cloud compression Robot vision systems Three-dimensional displays Visualization
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page	912
container_issue
container_start_page	906
container_title
container_volume
creator	Pan, Jin Mu, Xiangru Qin, Tong Xu, Chunjing Yang, Ming
description	Accurate localization ability is a crucial component for autonomous robots. Given existing LiDAR 3D points maps, it is cost-effective to localize the robot only with onboard camera compared to LiDAR. However, matching 2D visual information with 3D point cloud maps presents huge challenges due to different modalities, dimensions, noise and occlusion issues. To overcome it, we propose an end-to-end neural network-based solution, which determines the 6-DoF pose of the camera relative to an existing LiDAR map with centimeter accuracy. Given a query image, a pre-acquired point cloud and an initial pose, the cross-modality network will output a precise pose. By projecting the 3D point cloud onto the image plane, a depth image is acquired as seen from the initial pose. Subsequently, a cross-modality flow network establishes the correspondences of 2D pixels and projected points. Importantly, we leverage a robust probabilistic Perspective-n-Point (PnP) module, which are capable of fine-tuning 2D pairs and learning the pairs weight in an end-to-end manner. A comprehensive evaluation of our proposed algorithm is conducted in KITTI datasets. Furthermore, deploying the algorithm on the real-world parking lot scenario validates its strong practicality of the proposed algorithm. We highlight that this research offers a cost-effective and highly accurate solution that can be readily deployed in low-cost commercial vehicles.
doi_str_mv	10.1109/IV55156.2024.10588575
format	conference_proceeding
fullrecord	<record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_10588575</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10588575</ieee_id><sourcerecordid>10588575</sourcerecordid><originalsourceid>FETCH-LOGICAL-i106t-a128acf4b92f93899dd05d16323d121c50c9dc323f036850e87d8932a422dec43</originalsourceid><addsrcrecordid>eNo1kMtKQzEURaMgWGv_QCE_kJpzktybDKX1UagP8DUsaZKL0dqUJFrq13tBHS02C9ZgE3IKfAzAzdnsWSlQzRg5yjFwpbVq1R4ZmdZoobiQWgPskwE2ElmLIA_JUSlvnCuFCAPyglMmpnSSUynsJnm7inVHb0PdpvxOu5TpxdqzmlgPOk-u99-2xrSm21hf6X1OS7uMq1hqdPThcxPyVyy9PiYHnV2VMPrjkDxdXjxOrtn87mo2OZ-zCLypzAJq6zq5NNgZoY3xnisPjUDhAcEp7ox3_eq4aLTiQbdeG4FWIvrgpBiSk99uDCEsNjl-2Lxb_B8hfgBSX1Ge</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>2D-3D Cross-Modality Network for End-to-End Localization with Probabilistic Supervision</title><source>IEEE Xplore All Conference Series</source><creator>Pan, Jin ; Mu, Xiangru ; Qin, Tong ; Xu, Chunjing ; Yang, Ming</creator><creatorcontrib>Pan, Jin ; Mu, Xiangru ; Qin, Tong ; Xu, Chunjing ; Yang, Ming</creatorcontrib><description>Accurate localization ability is a crucial component for autonomous robots. Given existing LiDAR 3D points maps, it is cost-effective to localize the robot only with onboard camera compared to LiDAR. However, matching 2D visual information with 3D point cloud maps presents huge challenges due to different modalities, dimensions, noise and occlusion issues. To overcome it, we propose an end-to-end neural network-based solution, which determines the 6-DoF pose of the camera relative to an existing LiDAR map with centimeter accuracy. Given a query image, a pre-acquired point cloud and an initial pose, the cross-modality network will output a precise pose. By projecting the 3D point cloud onto the image plane, a depth image is acquired as seen from the initial pose. Subsequently, a cross-modality flow network establishes the correspondences of 2D pixels and projected points. Importantly, we leverage a robust probabilistic Perspective-n-Point (PnP) module, which are capable of fine-tuning 2D pairs and learning the pairs weight in an end-to-end manner. A comprehensive evaluation of our proposed algorithm is conducted in KITTI datasets. Furthermore, deploying the algorithm on the real-world parking lot scenario validates its strong practicality of the proposed algorithm. We highlight that this research offers a cost-effective and highly accurate solution that can be readily deployed in low-cost commercial vehicles.</description><identifier>EISSN: 2642-7214</identifier><identifier>EISBN: 9798350348811</identifier><identifier>DOI: 10.1109/IV55156.2024.10588575</identifier><language>eng</language><publisher>IEEE</publisher><subject>Accuracy ; Laser radar ; Location awareness ; Point cloud compression ; Robot vision systems ; Three-dimensional displays ; Visualization</subject><ispartof>2024 IEEE Intelligent Vehicles Symposium (IV), 2024, p.906-912</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10588575$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,777,781,786,787,27906,54536,54913</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10588575$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Pan, Jin</creatorcontrib><creatorcontrib>Mu, Xiangru</creatorcontrib><creatorcontrib>Qin, Tong</creatorcontrib><creatorcontrib>Xu, Chunjing</creatorcontrib><creatorcontrib>Yang, Ming</creatorcontrib><title>2D-3D Cross-Modality Network for End-to-End Localization with Probabilistic Supervision</title><title>2024 IEEE Intelligent Vehicles Symposium (IV)</title><addtitle>IV</addtitle><description>Accurate localization ability is a crucial component for autonomous robots. Given existing LiDAR 3D points maps, it is cost-effective to localize the robot only with onboard camera compared to LiDAR. However, matching 2D visual information with 3D point cloud maps presents huge challenges due to different modalities, dimensions, noise and occlusion issues. To overcome it, we propose an end-to-end neural network-based solution, which determines the 6-DoF pose of the camera relative to an existing LiDAR map with centimeter accuracy. Given a query image, a pre-acquired point cloud and an initial pose, the cross-modality network will output a precise pose. By projecting the 3D point cloud onto the image plane, a depth image is acquired as seen from the initial pose. Subsequently, a cross-modality flow network establishes the correspondences of 2D pixels and projected points. Importantly, we leverage a robust probabilistic Perspective-n-Point (PnP) module, which are capable of fine-tuning 2D pairs and learning the pairs weight in an end-to-end manner. A comprehensive evaluation of our proposed algorithm is conducted in KITTI datasets. Furthermore, deploying the algorithm on the real-world parking lot scenario validates its strong practicality of the proposed algorithm. We highlight that this research offers a cost-effective and highly accurate solution that can be readily deployed in low-cost commercial vehicles.</description><subject>Accuracy</subject><subject>Laser radar</subject><subject>Location awareness</subject><subject>Point cloud compression</subject><subject>Robot vision systems</subject><subject>Three-dimensional displays</subject><subject>Visualization</subject><issn>2642-7214</issn><isbn>9798350348811</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2024</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNo1kMtKQzEURaMgWGv_QCE_kJpzktybDKX1UagP8DUsaZKL0dqUJFrq13tBHS02C9ZgE3IKfAzAzdnsWSlQzRg5yjFwpbVq1R4ZmdZoobiQWgPskwE2ElmLIA_JUSlvnCuFCAPyglMmpnSSUynsJnm7inVHb0PdpvxOu5TpxdqzmlgPOk-u99-2xrSm21hf6X1OS7uMq1hqdPThcxPyVyy9PiYHnV2VMPrjkDxdXjxOrtn87mo2OZ-zCLypzAJq6zq5NNgZoY3xnisPjUDhAcEp7ox3_eq4aLTiQbdeG4FWIvrgpBiSk99uDCEsNjl-2Lxb_B8hfgBSX1Ge</recordid><startdate>20240602</startdate><enddate>20240602</enddate><creator>Pan, Jin</creator><creator>Mu, Xiangru</creator><creator>Qin, Tong</creator><creator>Xu, Chunjing</creator><creator>Yang, Ming</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>20240602</creationdate><title>2D-3D Cross-Modality Network for End-to-End Localization with Probabilistic Supervision</title><author>Pan, Jin ; Mu, Xiangru ; Qin, Tong ; Xu, Chunjing ; Yang, Ming</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i106t-a128acf4b92f93899dd05d16323d121c50c9dc323f036850e87d8932a422dec43</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Laser radar</topic><topic>Location awareness</topic><topic>Point cloud compression</topic><topic>Robot vision systems</topic><topic>Three-dimensional displays</topic><topic>Visualization</topic><toplevel>online_resources</toplevel><creatorcontrib>Pan, Jin</creatorcontrib><creatorcontrib>Mu, Xiangru</creatorcontrib><creatorcontrib>Qin, Tong</creatorcontrib><creatorcontrib>Xu, Chunjing</creatorcontrib><creatorcontrib>Yang, Ming</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Xplore (Online service)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Pan, Jin</au><au>Mu, Xiangru</au><au>Qin, Tong</au><au>Xu, Chunjing</au><au>Yang, Ming</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>2D-3D Cross-Modality Network for End-to-End Localization with Probabilistic Supervision</atitle><btitle>2024 IEEE Intelligent Vehicles Symposium (IV)</btitle><stitle>IV</stitle><date>2024-06-02</date><risdate>2024</risdate><spage>906</spage><epage>912</epage><pages>906-912</pages><eissn>2642-7214</eissn><eisbn>9798350348811</eisbn><abstract>Accurate localization ability is a crucial component for autonomous robots. Given existing LiDAR 3D points maps, it is cost-effective to localize the robot only with onboard camera compared to LiDAR. However, matching 2D visual information with 3D point cloud maps presents huge challenges due to different modalities, dimensions, noise and occlusion issues. To overcome it, we propose an end-to-end neural network-based solution, which determines the 6-DoF pose of the camera relative to an existing LiDAR map with centimeter accuracy. Given a query image, a pre-acquired point cloud and an initial pose, the cross-modality network will output a precise pose. By projecting the 3D point cloud onto the image plane, a depth image is acquired as seen from the initial pose. Subsequently, a cross-modality flow network establishes the correspondences of 2D pixels and projected points. Importantly, we leverage a robust probabilistic Perspective-n-Point (PnP) module, which are capable of fine-tuning 2D pairs and learning the pairs weight in an end-to-end manner. A comprehensive evaluation of our proposed algorithm is conducted in KITTI datasets. Furthermore, deploying the algorithm on the real-world parking lot scenario validates its strong practicality of the proposed algorithm. We highlight that this research offers a cost-effective and highly accurate solution that can be readily deployed in low-cost commercial vehicles.</abstract><pub>IEEE</pub><doi>10.1109/IV55156.2024.10588575</doi><tpages>7</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	EISSN: 2642-7214
ispartof	2024 IEEE Intelligent Vehicles Symposium (IV), 2024, p.906-912
issn	2642-7214
language	eng
recordid	cdi_ieee_primary_10588575
source	IEEE Xplore All Conference Series
subjects	Accuracy Laser radar Location awareness Point cloud compression Robot vision systems Three-dimensional displays Visualization
title	2D-3D Cross-Modality Network for End-to-End Localization with Probabilistic Supervision
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-19T12%3A43%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=2D-3D%20Cross-Modality%20Network%20for%20End-to-End%20Localization%20with%20Probabilistic%20Supervision&rft.btitle=2024%20IEEE%20Intelligent%20Vehicles%20Symposium%20(IV)&rft.au=Pan,%20Jin&rft.date=2024-06-02&rft.spage=906&rft.epage=912&rft.pages=906-912&rft.eissn=2642-7214&rft_id=info:doi/10.1109/IV55156.2024.10588575&rft.eisbn=9798350348811&rft_dat=%3Cieee_CHZPO%3E10588575%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i106t-a128acf4b92f93899dd05d16323d121c50c9dc323f036850e87d8932a422dec43%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10588575&rfr_iscdi=true