Loading…
2D-3D Cross-Modality Network for End-to-End Localization with Probabilistic Supervision
Accurate localization ability is a crucial component for autonomous robots. Given existing LiDAR 3D points maps, it is cost-effective to localize the robot only with onboard camera compared to LiDAR. However, matching 2D visual information with 3D point cloud maps presents huge challenges due to dif...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | 912 |
container_issue | |
container_start_page | 906 |
container_title | |
container_volume | |
creator | Pan, Jin Mu, Xiangru Qin, Tong Xu, Chunjing Yang, Ming |
description | Accurate localization ability is a crucial component for autonomous robots. Given existing LiDAR 3D points maps, it is cost-effective to localize the robot only with onboard camera compared to LiDAR. However, matching 2D visual information with 3D point cloud maps presents huge challenges due to different modalities, dimensions, noise and occlusion issues. To overcome it, we propose an end-to-end neural network-based solution, which determines the 6-DoF pose of the camera relative to an existing LiDAR map with centimeter accuracy. Given a query image, a pre-acquired point cloud and an initial pose, the cross-modality network will output a precise pose. By projecting the 3D point cloud onto the image plane, a depth image is acquired as seen from the initial pose. Subsequently, a cross-modality flow network establishes the correspondences of 2D pixels and projected points. Importantly, we leverage a robust probabilistic Perspective-n-Point (PnP) module, which are capable of fine-tuning 2D pairs and learning the pairs weight in an end-to-end manner. A comprehensive evaluation of our proposed algorithm is conducted in KITTI datasets. Furthermore, deploying the algorithm on the real-world parking lot scenario validates its strong practicality of the proposed algorithm. We highlight that this research offers a cost-effective and highly accurate solution that can be readily deployed in low-cost commercial vehicles. |
doi_str_mv | 10.1109/IV55156.2024.10588575 |
format | conference_proceeding |
fullrecord | <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_10588575</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10588575</ieee_id><sourcerecordid>10588575</sourcerecordid><originalsourceid>FETCH-LOGICAL-i106t-a128acf4b92f93899dd05d16323d121c50c9dc323f036850e87d8932a422dec43</originalsourceid><addsrcrecordid>eNo1kMtKQzEURaMgWGv_QCE_kJpzktybDKX1UagP8DUsaZKL0dqUJFrq13tBHS02C9ZgE3IKfAzAzdnsWSlQzRg5yjFwpbVq1R4ZmdZoobiQWgPskwE2ElmLIA_JUSlvnCuFCAPyglMmpnSSUynsJnm7inVHb0PdpvxOu5TpxdqzmlgPOk-u99-2xrSm21hf6X1OS7uMq1hqdPThcxPyVyy9PiYHnV2VMPrjkDxdXjxOrtn87mo2OZ-zCLypzAJq6zq5NNgZoY3xnisPjUDhAcEp7ox3_eq4aLTiQbdeG4FWIvrgpBiSk99uDCEsNjl-2Lxb_B8hfgBSX1Ge</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>2D-3D Cross-Modality Network for End-to-End Localization with Probabilistic Supervision</title><source>IEEE Xplore All Conference Series</source><creator>Pan, Jin ; Mu, Xiangru ; Qin, Tong ; Xu, Chunjing ; Yang, Ming</creator><creatorcontrib>Pan, Jin ; Mu, Xiangru ; Qin, Tong ; Xu, Chunjing ; Yang, Ming</creatorcontrib><description>Accurate localization ability is a crucial component for autonomous robots. Given existing LiDAR 3D points maps, it is cost-effective to localize the robot only with onboard camera compared to LiDAR. However, matching 2D visual information with 3D point cloud maps presents huge challenges due to different modalities, dimensions, noise and occlusion issues. To overcome it, we propose an end-to-end neural network-based solution, which determines the 6-DoF pose of the camera relative to an existing LiDAR map with centimeter accuracy. Given a query image, a pre-acquired point cloud and an initial pose, the cross-modality network will output a precise pose. By projecting the 3D point cloud onto the image plane, a depth image is acquired as seen from the initial pose. Subsequently, a cross-modality flow network establishes the correspondences of 2D pixels and projected points. Importantly, we leverage a robust probabilistic Perspective-n-Point (PnP) module, which are capable of fine-tuning 2D pairs and learning the pairs weight in an end-to-end manner. A comprehensive evaluation of our proposed algorithm is conducted in KITTI datasets. Furthermore, deploying the algorithm on the real-world parking lot scenario validates its strong practicality of the proposed algorithm. We highlight that this research offers a cost-effective and highly accurate solution that can be readily deployed in low-cost commercial vehicles.</description><identifier>EISSN: 2642-7214</identifier><identifier>EISBN: 9798350348811</identifier><identifier>DOI: 10.1109/IV55156.2024.10588575</identifier><language>eng</language><publisher>IEEE</publisher><subject>Accuracy ; Laser radar ; Location awareness ; Point cloud compression ; Robot vision systems ; Three-dimensional displays ; Visualization</subject><ispartof>2024 IEEE Intelligent Vehicles Symposium (IV), 2024, p.906-912</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10588575$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,777,781,786,787,27906,54536,54913</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10588575$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Pan, Jin</creatorcontrib><creatorcontrib>Mu, Xiangru</creatorcontrib><creatorcontrib>Qin, Tong</creatorcontrib><creatorcontrib>Xu, Chunjing</creatorcontrib><creatorcontrib>Yang, Ming</creatorcontrib><title>2D-3D Cross-Modality Network for End-to-End Localization with Probabilistic Supervision</title><title>2024 IEEE Intelligent Vehicles Symposium (IV)</title><addtitle>IV</addtitle><description>Accurate localization ability is a crucial component for autonomous robots. Given existing LiDAR 3D points maps, it is cost-effective to localize the robot only with onboard camera compared to LiDAR. However, matching 2D visual information with 3D point cloud maps presents huge challenges due to different modalities, dimensions, noise and occlusion issues. To overcome it, we propose an end-to-end neural network-based solution, which determines the 6-DoF pose of the camera relative to an existing LiDAR map with centimeter accuracy. Given a query image, a pre-acquired point cloud and an initial pose, the cross-modality network will output a precise pose. By projecting the 3D point cloud onto the image plane, a depth image is acquired as seen from the initial pose. Subsequently, a cross-modality flow network establishes the correspondences of 2D pixels and projected points. Importantly, we leverage a robust probabilistic Perspective-n-Point (PnP) module, which are capable of fine-tuning 2D pairs and learning the pairs weight in an end-to-end manner. A comprehensive evaluation of our proposed algorithm is conducted in KITTI datasets. Furthermore, deploying the algorithm on the real-world parking lot scenario validates its strong practicality of the proposed algorithm. We highlight that this research offers a cost-effective and highly accurate solution that can be readily deployed in low-cost commercial vehicles.</description><subject>Accuracy</subject><subject>Laser radar</subject><subject>Location awareness</subject><subject>Point cloud compression</subject><subject>Robot vision systems</subject><subject>Three-dimensional displays</subject><subject>Visualization</subject><issn>2642-7214</issn><isbn>9798350348811</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2024</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNo1kMtKQzEURaMgWGv_QCE_kJpzktybDKX1UagP8DUsaZKL0dqUJFrq13tBHS02C9ZgE3IKfAzAzdnsWSlQzRg5yjFwpbVq1R4ZmdZoobiQWgPskwE2ElmLIA_JUSlvnCuFCAPyglMmpnSSUynsJnm7inVHb0PdpvxOu5TpxdqzmlgPOk-u99-2xrSm21hf6X1OS7uMq1hqdPThcxPyVyy9PiYHnV2VMPrjkDxdXjxOrtn87mo2OZ-zCLypzAJq6zq5NNgZoY3xnisPjUDhAcEp7ox3_eq4aLTiQbdeG4FWIvrgpBiSk99uDCEsNjl-2Lxb_B8hfgBSX1Ge</recordid><startdate>20240602</startdate><enddate>20240602</enddate><creator>Pan, Jin</creator><creator>Mu, Xiangru</creator><creator>Qin, Tong</creator><creator>Xu, Chunjing</creator><creator>Yang, Ming</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>20240602</creationdate><title>2D-3D Cross-Modality Network for End-to-End Localization with Probabilistic Supervision</title><author>Pan, Jin ; Mu, Xiangru ; Qin, Tong ; Xu, Chunjing ; Yang, Ming</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i106t-a128acf4b92f93899dd05d16323d121c50c9dc323f036850e87d8932a422dec43</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Laser radar</topic><topic>Location awareness</topic><topic>Point cloud compression</topic><topic>Robot vision systems</topic><topic>Three-dimensional displays</topic><topic>Visualization</topic><toplevel>online_resources</toplevel><creatorcontrib>Pan, Jin</creatorcontrib><creatorcontrib>Mu, Xiangru</creatorcontrib><creatorcontrib>Qin, Tong</creatorcontrib><creatorcontrib>Xu, Chunjing</creatorcontrib><creatorcontrib>Yang, Ming</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Xplore (Online service)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Pan, Jin</au><au>Mu, Xiangru</au><au>Qin, Tong</au><au>Xu, Chunjing</au><au>Yang, Ming</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>2D-3D Cross-Modality Network for End-to-End Localization with Probabilistic Supervision</atitle><btitle>2024 IEEE Intelligent Vehicles Symposium (IV)</btitle><stitle>IV</stitle><date>2024-06-02</date><risdate>2024</risdate><spage>906</spage><epage>912</epage><pages>906-912</pages><eissn>2642-7214</eissn><eisbn>9798350348811</eisbn><abstract>Accurate localization ability is a crucial component for autonomous robots. Given existing LiDAR 3D points maps, it is cost-effective to localize the robot only with onboard camera compared to LiDAR. However, matching 2D visual information with 3D point cloud maps presents huge challenges due to different modalities, dimensions, noise and occlusion issues. To overcome it, we propose an end-to-end neural network-based solution, which determines the 6-DoF pose of the camera relative to an existing LiDAR map with centimeter accuracy. Given a query image, a pre-acquired point cloud and an initial pose, the cross-modality network will output a precise pose. By projecting the 3D point cloud onto the image plane, a depth image is acquired as seen from the initial pose. Subsequently, a cross-modality flow network establishes the correspondences of 2D pixels and projected points. Importantly, we leverage a robust probabilistic Perspective-n-Point (PnP) module, which are capable of fine-tuning 2D pairs and learning the pairs weight in an end-to-end manner. A comprehensive evaluation of our proposed algorithm is conducted in KITTI datasets. Furthermore, deploying the algorithm on the real-world parking lot scenario validates its strong practicality of the proposed algorithm. We highlight that this research offers a cost-effective and highly accurate solution that can be readily deployed in low-cost commercial vehicles.</abstract><pub>IEEE</pub><doi>10.1109/IV55156.2024.10588575</doi><tpages>7</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | EISSN: 2642-7214 |
ispartof | 2024 IEEE Intelligent Vehicles Symposium (IV), 2024, p.906-912 |
issn | 2642-7214 |
language | eng |
recordid | cdi_ieee_primary_10588575 |
source | IEEE Xplore All Conference Series |
subjects | Accuracy Laser radar Location awareness Point cloud compression Robot vision systems Three-dimensional displays Visualization |
title | 2D-3D Cross-Modality Network for End-to-End Localization with Probabilistic Supervision |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-19T12%3A43%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=2D-3D%20Cross-Modality%20Network%20for%20End-to-End%20Localization%20with%20Probabilistic%20Supervision&rft.btitle=2024%20IEEE%20Intelligent%20Vehicles%20Symposium%20(IV)&rft.au=Pan,%20Jin&rft.date=2024-06-02&rft.spage=906&rft.epage=912&rft.pages=906-912&rft.eissn=2642-7214&rft_id=info:doi/10.1109/IV55156.2024.10588575&rft.eisbn=9798350348811&rft_dat=%3Cieee_CHZPO%3E10588575%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i106t-a128acf4b92f93899dd05d16323d121c50c9dc323f036850e87d8932a422dec43%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10588575&rfr_iscdi=true |