Loading…

Cross-View Tracking for Multi-Human 3D Pose Estimation at Over 100 FPS

Estimating 3D poses of multiple humans in real-time is a classic but still challenging task in computer vision. Its major difficulty lies in the ambiguity in cross-view association of 2D poses and the huge state space when there are multiple people in multiple views. In this paper, we present a nove...

Full description

Saved in:

Bibliographic Details
Main Authors:	Chen, Long, Ai, Haizhou, Chen, Rui, Zhuang, Zijie, Liu, Shuang
Format:	Conference Proceeding
Language:	English
Subjects:	Cameras Pose estimation Target tracking Task analysis Three-dimensional displays Two dimensional displays Videos
Citations:	Items that cite this one
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c254t-68b94b51f5b0c6b1d4c32c016912200822993648275c188dc72e1122f91660883
cites
container_end_page	3285
container_issue
container_start_page	3276
container_title
container_volume
creator	Chen, Long Ai, Haizhou Chen, Rui Zhuang, Zijie Liu, Shuang
description	Estimating 3D poses of multiple humans in real-time is a classic but still challenging task in computer vision. Its major difficulty lies in the ambiguity in cross-view association of 2D poses and the huge state space when there are multiple people in multiple views. In this paper, we present a novel solution for multi-human 3D pose estimation from multiple calibrated camera views. It takes 2D poses in different camera coordinates as inputs and aims for the accurate 3D poses in the global coordinate. Unlike previous methods that associate 2D poses among all pairs of views from scratch at every frame, we exploit the temporal consistency in videos to match the 2D inputs with 3D poses directly in 3-space. More specifically, we propose to retain the 3D pose for each person and update them iteratively via the cross-view multi-human tracking. This novel formulation improves both accuracy and efficiency, as we demonstrated on widely-used public datasets. To further verify the scalability of our method, we propose a new large-scale multi-human dataset with 12 to 28 camera views. Without bells and whistles, our solution achieves 154 FPS on 12 cameras and 34 FPS on 28 cameras, indicating its ability to handle large-scale real-world applications. The proposed dataset will be released at https://github.com/longcw/crossview_3d_pose_tracking.
doi_str_mv	10.1109/CVPR42600.2020.00334
format	conference_proceeding
fullrecord	<record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_9156586</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9156586</ieee_id><sourcerecordid>9156586</sourcerecordid><originalsourceid>FETCH-LOGICAL-c254t-68b94b51f5b0c6b1d4c32c016912200822993648275c188dc72e1122f91660883</originalsourceid><addsrcrecordid>eNotjs1Kw0AUhUdBsNQ8gS7mBRLvvZP5W0psrFBp0NptSaYTibaJzKSKb29AF4ezOB-Hj7EbhAwR7G2xrZ5zUgAZAUEGIER-xhKrDWqagsrIczYjqWWqQctLlsT4DhNHiMqaGSuLMMSYbjv_zTehdh9d_8bbIfCn02Hs0uXpWPdc3PNqiJ4v4tgd67Ebel6PfP3lA0cAXlYvV-yirQ_RJ_89Z6_lYlMs09X64bG4W6WOZD6myjQ2byS2sgGnGtznTpCDSQWJAAyRtULlhrR0aMzeafI4Ta1FpcAYMWfXf7-d9373GSad8LOzKJU0SvwCa1lIrA</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Cross-View Tracking for Multi-Human 3D Pose Estimation at Over 100 FPS</title><source>IEEE Xplore All Conference Series</source><creator>Chen, Long ; Ai, Haizhou ; Chen, Rui ; Zhuang, Zijie ; Liu, Shuang</creator><creatorcontrib>Chen, Long ; Ai, Haizhou ; Chen, Rui ; Zhuang, Zijie ; Liu, Shuang</creatorcontrib><description>Estimating 3D poses of multiple humans in real-time is a classic but still challenging task in computer vision. Its major difficulty lies in the ambiguity in cross-view association of 2D poses and the huge state space when there are multiple people in multiple views. In this paper, we present a novel solution for multi-human 3D pose estimation from multiple calibrated camera views. It takes 2D poses in different camera coordinates as inputs and aims for the accurate 3D poses in the global coordinate. Unlike previous methods that associate 2D poses among all pairs of views from scratch at every frame, we exploit the temporal consistency in videos to match the 2D inputs with 3D poses directly in 3-space. More specifically, we propose to retain the 3D pose for each person and update them iteratively via the cross-view multi-human tracking. This novel formulation improves both accuracy and efficiency, as we demonstrated on widely-used public datasets. To further verify the scalability of our method, we propose a new large-scale multi-human dataset with 12 to 28 camera views. Without bells and whistles, our solution achieves 154 FPS on 12 cameras and 34 FPS on 28 cameras, indicating its ability to handle large-scale real-world applications. The proposed dataset will be released at https://github.com/longcw/crossview_3d_pose_tracking.</description><identifier>EISSN: 2575-7075</identifier><identifier>EISBN: 9781728171685</identifier><identifier>EISBN: 1728171687</identifier><identifier>DOI: 10.1109/CVPR42600.2020.00334</identifier><identifier>CODEN: IEEPAD</identifier><language>eng</language><publisher>IEEE</publisher><subject>Cameras ; Pose estimation ; Target tracking ; Task analysis ; Three-dimensional displays ; Two dimensional displays ; Videos</subject><ispartof>2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, p.3276-3285</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c254t-68b94b51f5b0c6b1d4c32c016912200822993648275c188dc72e1122f91660883</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9156586$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,777,781,786,787,27906,54536,54913</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9156586$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Chen, Long</creatorcontrib><creatorcontrib>Ai, Haizhou</creatorcontrib><creatorcontrib>Chen, Rui</creatorcontrib><creatorcontrib>Zhuang, Zijie</creatorcontrib><creatorcontrib>Liu, Shuang</creatorcontrib><title>Cross-View Tracking for Multi-Human 3D Pose Estimation at Over 100 FPS</title><title>2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</title><addtitle>CVPR</addtitle><description>Estimating 3D poses of multiple humans in real-time is a classic but still challenging task in computer vision. Its major difficulty lies in the ambiguity in cross-view association of 2D poses and the huge state space when there are multiple people in multiple views. In this paper, we present a novel solution for multi-human 3D pose estimation from multiple calibrated camera views. It takes 2D poses in different camera coordinates as inputs and aims for the accurate 3D poses in the global coordinate. Unlike previous methods that associate 2D poses among all pairs of views from scratch at every frame, we exploit the temporal consistency in videos to match the 2D inputs with 3D poses directly in 3-space. More specifically, we propose to retain the 3D pose for each person and update them iteratively via the cross-view multi-human tracking. This novel formulation improves both accuracy and efficiency, as we demonstrated on widely-used public datasets. To further verify the scalability of our method, we propose a new large-scale multi-human dataset with 12 to 28 camera views. Without bells and whistles, our solution achieves 154 FPS on 12 cameras and 34 FPS on 28 cameras, indicating its ability to handle large-scale real-world applications. The proposed dataset will be released at https://github.com/longcw/crossview_3d_pose_tracking.</description><subject>Cameras</subject><subject>Pose estimation</subject><subject>Target tracking</subject><subject>Task analysis</subject><subject>Three-dimensional displays</subject><subject>Two dimensional displays</subject><subject>Videos</subject><issn>2575-7075</issn><isbn>9781728171685</isbn><isbn>1728171687</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2020</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotjs1Kw0AUhUdBsNQ8gS7mBRLvvZP5W0psrFBp0NptSaYTibaJzKSKb29AF4ezOB-Hj7EbhAwR7G2xrZ5zUgAZAUEGIER-xhKrDWqagsrIczYjqWWqQctLlsT4DhNHiMqaGSuLMMSYbjv_zTehdh9d_8bbIfCn02Hs0uXpWPdc3PNqiJ4v4tgd67Ebel6PfP3lA0cAXlYvV-yirQ_RJ_89Z6_lYlMs09X64bG4W6WOZD6myjQ2byS2sgGnGtznTpCDSQWJAAyRtULlhrR0aMzeafI4Ta1FpcAYMWfXf7-d9373GSad8LOzKJU0SvwCa1lIrA</recordid><startdate>202006</startdate><enddate>202006</enddate><creator>Chen, Long</creator><creator>Ai, Haizhou</creator><creator>Chen, Rui</creator><creator>Zhuang, Zijie</creator><creator>Liu, Shuang</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>202006</creationdate><title>Cross-View Tracking for Multi-Human 3D Pose Estimation at Over 100 FPS</title><author>Chen, Long ; Ai, Haizhou ; Chen, Rui ; Zhuang, Zijie ; Liu, Shuang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c254t-68b94b51f5b0c6b1d4c32c016912200822993648275c188dc72e1122f91660883</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Cameras</topic><topic>Pose estimation</topic><topic>Target tracking</topic><topic>Task analysis</topic><topic>Three-dimensional displays</topic><topic>Two dimensional displays</topic><topic>Videos</topic><toplevel>online_resources</toplevel><creatorcontrib>Chen, Long</creatorcontrib><creatorcontrib>Ai, Haizhou</creatorcontrib><creatorcontrib>Chen, Rui</creatorcontrib><creatorcontrib>Zhuang, Zijie</creatorcontrib><creatorcontrib>Liu, Shuang</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Xplore</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Chen, Long</au><au>Ai, Haizhou</au><au>Chen, Rui</au><au>Zhuang, Zijie</au><au>Liu, Shuang</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Cross-View Tracking for Multi-Human 3D Pose Estimation at Over 100 FPS</atitle><btitle>2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</btitle><stitle>CVPR</stitle><date>2020-06</date><risdate>2020</risdate><spage>3276</spage><epage>3285</epage><pages>3276-3285</pages><eissn>2575-7075</eissn><eisbn>9781728171685</eisbn><eisbn>1728171687</eisbn><coden>IEEPAD</coden><abstract>Estimating 3D poses of multiple humans in real-time is a classic but still challenging task in computer vision. Its major difficulty lies in the ambiguity in cross-view association of 2D poses and the huge state space when there are multiple people in multiple views. In this paper, we present a novel solution for multi-human 3D pose estimation from multiple calibrated camera views. It takes 2D poses in different camera coordinates as inputs and aims for the accurate 3D poses in the global coordinate. Unlike previous methods that associate 2D poses among all pairs of views from scratch at every frame, we exploit the temporal consistency in videos to match the 2D inputs with 3D poses directly in 3-space. More specifically, we propose to retain the 3D pose for each person and update them iteratively via the cross-view multi-human tracking. This novel formulation improves both accuracy and efficiency, as we demonstrated on widely-used public datasets. To further verify the scalability of our method, we propose a new large-scale multi-human dataset with 12 to 28 camera views. Without bells and whistles, our solution achieves 154 FPS on 12 cameras and 34 FPS on 28 cameras, indicating its ability to handle large-scale real-world applications. The proposed dataset will be released at https://github.com/longcw/crossview_3d_pose_tracking.</abstract><pub>IEEE</pub><doi>10.1109/CVPR42600.2020.00334</doi><tpages>10</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	EISSN: 2575-7075
ispartof	2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, p.3276-3285
issn	2575-7075
language	eng
recordid	cdi_ieee_primary_9156586
source	IEEE Xplore All Conference Series
subjects	Cameras Pose estimation Target tracking Task analysis Three-dimensional displays Two dimensional displays Videos
title	Cross-View Tracking for Multi-Human 3D Pose Estimation at Over 100 FPS
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-20T05%3A37%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Cross-View%20Tracking%20for%20Multi-Human%203D%20Pose%20Estimation%20at%20Over%20100%20FPS&rft.btitle=2020%20IEEE/CVF%20Conference%20on%20Computer%20Vision%20and%20Pattern%20Recognition%20(CVPR)&rft.au=Chen,%20Long&rft.date=2020-06&rft.spage=3276&rft.epage=3285&rft.pages=3276-3285&rft.eissn=2575-7075&rft.coden=IEEPAD&rft_id=info:doi/10.1109/CVPR42600.2020.00334&rft.eisbn=9781728171685&rft.eisbn_list=1728171687&rft_dat=%3Cieee_CHZPO%3E9156586%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c254t-68b94b51f5b0c6b1d4c32c016912200822993648275c188dc72e1122f91660883%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=9156586&rfr_iscdi=true