Loading…

LASOR: Learning Accurate 3D Human Pose and Shape via Synthetic Occlusion-Aware Data and Neural Mesh Rendering

A key challenge in the task of human pose and shape estimation is occlusion, including self-occlusions, object-human occlusions, and inter-person occlusions. The lack of diverse and accurate pose and shape training data becomes a major bottleneck, especially for scenes with occlusions in the wild. I...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on image processing 2022, Vol.31, p.1938-1948
Main Authors: Yang, Kaibing, Gu, Renshu, Wang, Maoyu, Toyoura, Masahiro, Xu, Gang
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c347t-66eb6a4e3956a933b09b516d577dfe9951630bc9b8179180fc7ebb11f24e1d33
cites cdi_FETCH-LOGICAL-c347t-66eb6a4e3956a933b09b516d577dfe9951630bc9b8179180fc7ebb11f24e1d33
container_end_page 1948
container_issue
container_start_page 1938
container_title IEEE transactions on image processing
container_volume 31
creator Yang, Kaibing
Gu, Renshu
Wang, Maoyu
Toyoura, Masahiro
Xu, Gang
description A key challenge in the task of human pose and shape estimation is occlusion, including self-occlusions, object-human occlusions, and inter-person occlusions. The lack of diverse and accurate pose and shape training data becomes a major bottleneck, especially for scenes with occlusions in the wild. In this paper, we focus on the estimation of human pose and shape in the case of inter-person occlusions, while also handling object-human occlusions and self-occlusion. We propose a novel framework that synthesizes occlusion-aware silhouette and 2D keypoints data and directly regress to the SMPL pose and shape parameters. A neural 3D mesh renderer is exploited to enable silhouette supervision on the fly, which contributes to great improvements in shape estimation. In addition, keypoints-and-silhouette-driven training data in panoramic viewpoints are synthesized to compensate for the lack of viewpoint diversity in any existing dataset. Experimental results show that we are among the state-of-the-art on the 3DPW and 3DPW-Crowd datasets in terms of pose estimation accuracy. The proposed method evidently outperforms Mesh Transformer, 3DCrowdNet and ROMP in terms of shape estimation. Top performance is also achieved on SSP-3D in terms of shape prediction accuracy. Demo and code will be available at https://igame-lab.github.io/LASOR/ .
doi_str_mv 10.1109/TIP.2022.3149229
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TIP_2022_3149229</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9709705</ieee_id><sourcerecordid>2629124605</sourcerecordid><originalsourceid>FETCH-LOGICAL-c347t-66eb6a4e3956a933b09b516d577dfe9951630bc9b8179180fc7ebb11f24e1d33</originalsourceid><addsrcrecordid>eNpdkctLw0AQxhdRfFTvgiALXryk7iuP9Vbqo0K1xfYeNpuJjSSbupso_vdubO1BGJiB-c3Hx3wInVMypJTIm-XTfMgIY0NOhWRM7qFjKgUNCBFs388kjIPYr47QiXPvhFAR0ugQHfGQCs5lcozq6Wgxe73FU1DWlOYNj7TurGoB8zs86Wpl8LxxgJXJ8WKl1oA_S4UX36ZdQVtqPNO66lzZmGD0pSzgO9WqX_gFvEyFn8Gt8CuYHKxXP0UHhaocnG37AC0f7pfjSTCdPT6NR9NAcxG3QRRBFikBXIaRkpxnRGbeeB7GcV6AlH7mJNMyS2gsaUIKHUOWUVowATTnfICuN7Jr23x04Nq0Lp2GqlIGms6lLGIJkzHhoUev_qHvTWeNN9dTkjIRkZ4iG0rbxjkLRbq2Za3sd0pJ2ieR-iTSPol0m4Q_udwKd1kN-e7g7_UeuNgAJQDs1t6Vr5D_AHW0ifo</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2629124605</pqid></control><display><type>article</type><title>LASOR: Learning Accurate 3D Human Pose and Shape via Synthetic Occlusion-Aware Data and Neural Mesh Rendering</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Yang, Kaibing ; Gu, Renshu ; Wang, Maoyu ; Toyoura, Masahiro ; Xu, Gang</creator><creatorcontrib>Yang, Kaibing ; Gu, Renshu ; Wang, Maoyu ; Toyoura, Masahiro ; Xu, Gang</creatorcontrib><description>A key challenge in the task of human pose and shape estimation is occlusion, including self-occlusions, object-human occlusions, and inter-person occlusions. The lack of diverse and accurate pose and shape training data becomes a major bottleneck, especially for scenes with occlusions in the wild. In this paper, we focus on the estimation of human pose and shape in the case of inter-person occlusions, while also handling object-human occlusions and self-occlusion. We propose a novel framework that synthesizes occlusion-aware silhouette and 2D keypoints data and directly regress to the SMPL pose and shape parameters. A neural 3D mesh renderer is exploited to enable silhouette supervision on the fly, which contributes to great improvements in shape estimation. In addition, keypoints-and-silhouette-driven training data in panoramic viewpoints are synthesized to compensate for the lack of viewpoint diversity in any existing dataset. Experimental results show that we are among the state-of-the-art on the 3DPW and 3DPW-Crowd datasets in terms of pose estimation accuracy. The proposed method evidently outperforms Mesh Transformer, 3DCrowdNet and ROMP in terms of shape estimation. Top performance is also achieved on SSP-3D in terms of shape prediction accuracy. Demo and code will be available at https://igame-lab.github.io/LASOR/ .</description><identifier>ISSN: 1057-7149</identifier><identifier>EISSN: 1941-0042</identifier><identifier>DOI: 10.1109/TIP.2022.3149229</identifier><identifier>PMID: 35143398</identifier><identifier>CODEN: IIPRE4</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>2D keypoint ; 3D human pose and shape estimation ; Biological system modeling ; Cameras ; Datasets ; Humans ; Imaging, Three-Dimensional - methods ; neural mesh renderer ; Occlusion ; occlusion-aware ; Pose estimation ; Shape ; silhouette ; Solid modeling ; Surgical Mesh ; Synthesis ; Three-dimensional displays ; Training ; Training data</subject><ispartof>IEEE transactions on image processing, 2022, Vol.31, p.1938-1948</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c347t-66eb6a4e3956a933b09b516d577dfe9951630bc9b8179180fc7ebb11f24e1d33</citedby><cites>FETCH-LOGICAL-c347t-66eb6a4e3956a933b09b516d577dfe9951630bc9b8179180fc7ebb11f24e1d33</cites><orcidid>0000-0002-1392-6115 ; 0000-0002-5897-7573 ; 0000-0002-3900-2148</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9709705$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,4024,27923,27924,27925,54796</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/35143398$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Yang, Kaibing</creatorcontrib><creatorcontrib>Gu, Renshu</creatorcontrib><creatorcontrib>Wang, Maoyu</creatorcontrib><creatorcontrib>Toyoura, Masahiro</creatorcontrib><creatorcontrib>Xu, Gang</creatorcontrib><title>LASOR: Learning Accurate 3D Human Pose and Shape via Synthetic Occlusion-Aware Data and Neural Mesh Rendering</title><title>IEEE transactions on image processing</title><addtitle>TIP</addtitle><addtitle>IEEE Trans Image Process</addtitle><description>A key challenge in the task of human pose and shape estimation is occlusion, including self-occlusions, object-human occlusions, and inter-person occlusions. The lack of diverse and accurate pose and shape training data becomes a major bottleneck, especially for scenes with occlusions in the wild. In this paper, we focus on the estimation of human pose and shape in the case of inter-person occlusions, while also handling object-human occlusions and self-occlusion. We propose a novel framework that synthesizes occlusion-aware silhouette and 2D keypoints data and directly regress to the SMPL pose and shape parameters. A neural 3D mesh renderer is exploited to enable silhouette supervision on the fly, which contributes to great improvements in shape estimation. In addition, keypoints-and-silhouette-driven training data in panoramic viewpoints are synthesized to compensate for the lack of viewpoint diversity in any existing dataset. Experimental results show that we are among the state-of-the-art on the 3DPW and 3DPW-Crowd datasets in terms of pose estimation accuracy. The proposed method evidently outperforms Mesh Transformer, 3DCrowdNet and ROMP in terms of shape estimation. Top performance is also achieved on SSP-3D in terms of shape prediction accuracy. Demo and code will be available at https://igame-lab.github.io/LASOR/ .</description><subject>2D keypoint</subject><subject>3D human pose and shape estimation</subject><subject>Biological system modeling</subject><subject>Cameras</subject><subject>Datasets</subject><subject>Humans</subject><subject>Imaging, Three-Dimensional - methods</subject><subject>neural mesh renderer</subject><subject>Occlusion</subject><subject>occlusion-aware</subject><subject>Pose estimation</subject><subject>Shape</subject><subject>silhouette</subject><subject>Solid modeling</subject><subject>Surgical Mesh</subject><subject>Synthesis</subject><subject>Three-dimensional displays</subject><subject>Training</subject><subject>Training data</subject><issn>1057-7149</issn><issn>1941-0042</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNpdkctLw0AQxhdRfFTvgiALXryk7iuP9Vbqo0K1xfYeNpuJjSSbupso_vdubO1BGJiB-c3Hx3wInVMypJTIm-XTfMgIY0NOhWRM7qFjKgUNCBFs388kjIPYr47QiXPvhFAR0ugQHfGQCs5lcozq6Wgxe73FU1DWlOYNj7TurGoB8zs86Wpl8LxxgJXJ8WKl1oA_S4UX36ZdQVtqPNO66lzZmGD0pSzgO9WqX_gFvEyFn8Gt8CuYHKxXP0UHhaocnG37AC0f7pfjSTCdPT6NR9NAcxG3QRRBFikBXIaRkpxnRGbeeB7GcV6AlH7mJNMyS2gsaUIKHUOWUVowATTnfICuN7Jr23x04Nq0Lp2GqlIGms6lLGIJkzHhoUev_qHvTWeNN9dTkjIRkZ4iG0rbxjkLRbq2Za3sd0pJ2ieR-iTSPol0m4Q_udwKd1kN-e7g7_UeuNgAJQDs1t6Vr5D_AHW0ifo</recordid><startdate>2022</startdate><enddate>2022</enddate><creator>Yang, Kaibing</creator><creator>Gu, Renshu</creator><creator>Wang, Maoyu</creator><creator>Toyoura, Masahiro</creator><creator>Xu, Gang</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-1392-6115</orcidid><orcidid>https://orcid.org/0000-0002-5897-7573</orcidid><orcidid>https://orcid.org/0000-0002-3900-2148</orcidid></search><sort><creationdate>2022</creationdate><title>LASOR: Learning Accurate 3D Human Pose and Shape via Synthetic Occlusion-Aware Data and Neural Mesh Rendering</title><author>Yang, Kaibing ; Gu, Renshu ; Wang, Maoyu ; Toyoura, Masahiro ; Xu, Gang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c347t-66eb6a4e3956a933b09b516d577dfe9951630bc9b8179180fc7ebb11f24e1d33</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>2D keypoint</topic><topic>3D human pose and shape estimation</topic><topic>Biological system modeling</topic><topic>Cameras</topic><topic>Datasets</topic><topic>Humans</topic><topic>Imaging, Three-Dimensional - methods</topic><topic>neural mesh renderer</topic><topic>Occlusion</topic><topic>occlusion-aware</topic><topic>Pose estimation</topic><topic>Shape</topic><topic>silhouette</topic><topic>Solid modeling</topic><topic>Surgical Mesh</topic><topic>Synthesis</topic><topic>Three-dimensional displays</topic><topic>Training</topic><topic>Training data</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yang, Kaibing</creatorcontrib><creatorcontrib>Gu, Renshu</creatorcontrib><creatorcontrib>Wang, Maoyu</creatorcontrib><creatorcontrib>Toyoura, Masahiro</creatorcontrib><creatorcontrib>Xu, Gang</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE/IET Electronic Library</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on image processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yang, Kaibing</au><au>Gu, Renshu</au><au>Wang, Maoyu</au><au>Toyoura, Masahiro</au><au>Xu, Gang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>LASOR: Learning Accurate 3D Human Pose and Shape via Synthetic Occlusion-Aware Data and Neural Mesh Rendering</atitle><jtitle>IEEE transactions on image processing</jtitle><stitle>TIP</stitle><addtitle>IEEE Trans Image Process</addtitle><date>2022</date><risdate>2022</risdate><volume>31</volume><spage>1938</spage><epage>1948</epage><pages>1938-1948</pages><issn>1057-7149</issn><eissn>1941-0042</eissn><coden>IIPRE4</coden><abstract>A key challenge in the task of human pose and shape estimation is occlusion, including self-occlusions, object-human occlusions, and inter-person occlusions. The lack of diverse and accurate pose and shape training data becomes a major bottleneck, especially for scenes with occlusions in the wild. In this paper, we focus on the estimation of human pose and shape in the case of inter-person occlusions, while also handling object-human occlusions and self-occlusion. We propose a novel framework that synthesizes occlusion-aware silhouette and 2D keypoints data and directly regress to the SMPL pose and shape parameters. A neural 3D mesh renderer is exploited to enable silhouette supervision on the fly, which contributes to great improvements in shape estimation. In addition, keypoints-and-silhouette-driven training data in panoramic viewpoints are synthesized to compensate for the lack of viewpoint diversity in any existing dataset. Experimental results show that we are among the state-of-the-art on the 3DPW and 3DPW-Crowd datasets in terms of pose estimation accuracy. The proposed method evidently outperforms Mesh Transformer, 3DCrowdNet and ROMP in terms of shape estimation. Top performance is also achieved on SSP-3D in terms of shape prediction accuracy. Demo and code will be available at https://igame-lab.github.io/LASOR/ .</abstract><cop>United States</cop><pub>IEEE</pub><pmid>35143398</pmid><doi>10.1109/TIP.2022.3149229</doi><tpages>11</tpages><orcidid>https://orcid.org/0000-0002-1392-6115</orcidid><orcidid>https://orcid.org/0000-0002-5897-7573</orcidid><orcidid>https://orcid.org/0000-0002-3900-2148</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1057-7149
ispartof IEEE transactions on image processing, 2022, Vol.31, p.1938-1948
issn 1057-7149
1941-0042
language eng
recordid cdi_crossref_primary_10_1109_TIP_2022_3149229
source IEEE Electronic Library (IEL) Journals
subjects 2D keypoint
3D human pose and shape estimation
Biological system modeling
Cameras
Datasets
Humans
Imaging, Three-Dimensional - methods
neural mesh renderer
Occlusion
occlusion-aware
Pose estimation
Shape
silhouette
Solid modeling
Surgical Mesh
Synthesis
Three-dimensional displays
Training
Training data
title LASOR: Learning Accurate 3D Human Pose and Shape via Synthetic Occlusion-Aware Data and Neural Mesh Rendering
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T16%3A20%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=LASOR:%20Learning%20Accurate%203D%20Human%20Pose%20and%20Shape%20via%20Synthetic%20Occlusion-Aware%20Data%20and%20Neural%20Mesh%20Rendering&rft.jtitle=IEEE%20transactions%20on%20image%20processing&rft.au=Yang,%20Kaibing&rft.date=2022&rft.volume=31&rft.spage=1938&rft.epage=1948&rft.pages=1938-1948&rft.issn=1057-7149&rft.eissn=1941-0042&rft.coden=IIPRE4&rft_id=info:doi/10.1109/TIP.2022.3149229&rft_dat=%3Cproquest_cross%3E2629124605%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c347t-66eb6a4e3956a933b09b516d577dfe9951630bc9b8179180fc7ebb11f24e1d33%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2629124605&rft_id=info:pmid/35143398&rft_ieee_id=9709705&rfr_iscdi=true