Loading…

Learning Temporal-Spatial Contextual Adaptation for Three-Dimensional Human Pose Estimation

Three-dimensional human pose estimation focuses on generating 3D pose sequences from 2D videos. It has enormous potential in the fields of human-robot interaction, remote sensing, virtual reality, and computer vision. Existing excellent methods primarily focus on exploring spatial or temporal encodi...

Full description

Saved in:
Bibliographic Details
Published in:Sensors (Basel, Switzerland) Switzerland), 2024-07, Vol.24 (13), p.4422
Main Authors: Wang, Hexin, Quan, Wei, Zhao, Runjing, Zhang, Miaomiao, Jiang, Na
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c304t-298cae8b18adc0955de02a144b28665eeaf9c09a7bcba0119f144c8edaf1608f3
container_end_page
container_issue 13
container_start_page 4422
container_title Sensors (Basel, Switzerland)
container_volume 24
creator Wang, Hexin
Quan, Wei
Zhao, Runjing
Zhang, Miaomiao
Jiang, Na
description Three-dimensional human pose estimation focuses on generating 3D pose sequences from 2D videos. It has enormous potential in the fields of human-robot interaction, remote sensing, virtual reality, and computer vision. Existing excellent methods primarily focus on exploring spatial or temporal encoding to achieve 3D pose inference. However, various architectures exploit the independent effects of spatial and temporal cues on 3D pose estimation, while neglecting the spatial-temporal synergistic influence. To address this issue, this paper proposes a novel 3D pose estimation method with a dual-adaptive spatial-temporal former (DASTFormer) and additional supervised training. The DASTFormer contains attention-adaptive (AtA) and pure-adaptive (PuA) modes, which will enhance pose inference from 2D to 3D by adaptively learning spatial-temporal effects, considering both their cooperative and independent influences. In addition, an additional supervised training with batch variance loss is proposed in this work. Different from common training strategy, a two-round parameter update is conducted on the same batch data. Not only can it better explore the potential relationship between spatial-temporal encoding and 3D poses, but it can also alleviate the batch size limitations imposed by graphics cards on transformer-based frameworks. Extensive experimental results show that the proposed method significantly outperforms most state-of-the-art approaches on Human3.6 and HumanEVA datasets.
doi_str_mv 10.3390/s24134422
format article
fullrecord <record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_c9a1be007754426abfaf12123cdf4117</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><doaj_id>oai_doaj_org_article_c9a1be007754426abfaf12123cdf4117</doaj_id><sourcerecordid>3079242415</sourcerecordid><originalsourceid>FETCH-LOGICAL-c304t-298cae8b18adc0955de02a144b28665eeaf9c09a7bcba0119f144c8edaf1608f3</originalsourceid><addsrcrecordid>eNpdkU9P3DAQxS1UBBR66BdAkXppDynjP0mcI1poQVoJJJZTD9bEGdOskjjYidR--xqWrqqePHr--Y31HmMfOXyVsoaLKBSXSglxwE64EirXQsC7f-Zj9j7GLYCQUuojdpweARcgTtiPNWEYu_Ep29Aw-YB9_jDh3GGfrfw40695SeNli9OcVD9mzods8zMQ5VfdQGNMWgJulgHH7N5Hyq7j3A2v7Bk7dNhH-vB2nrLHb9eb1U2-vvt-u7pc51aCmnNRa4ukG66xtVAXRUsgkCvVCF2WBRG6OulYNbZB4Lx26c5qatHxErSTp-x259t63JoppPXht_HYmVfBhyeDYe5sT8bWyBsCqKoi5VVi45KJ4ELa1inOq-T1eec1Bf-8UJzN0EVLfY8j-SUaCVWtS1AlJPTTf-jWLyGlsaNECp8Xifqyo2zwMQZy-w9yMC_tmX17iT1_c1yagdo9-bcu-QckV5OM</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3079242415</pqid></control><display><type>article</type><title>Learning Temporal-Spatial Contextual Adaptation for Three-Dimensional Human Pose Estimation</title><source>Publicly Available Content Database</source><source>PubMed Central</source><creator>Wang, Hexin ; Quan, Wei ; Zhao, Runjing ; Zhang, Miaomiao ; Jiang, Na</creator><creatorcontrib>Wang, Hexin ; Quan, Wei ; Zhao, Runjing ; Zhang, Miaomiao ; Jiang, Na</creatorcontrib><description>Three-dimensional human pose estimation focuses on generating 3D pose sequences from 2D videos. It has enormous potential in the fields of human-robot interaction, remote sensing, virtual reality, and computer vision. Existing excellent methods primarily focus on exploring spatial or temporal encoding to achieve 3D pose inference. However, various architectures exploit the independent effects of spatial and temporal cues on 3D pose estimation, while neglecting the spatial-temporal synergistic influence. To address this issue, this paper proposes a novel 3D pose estimation method with a dual-adaptive spatial-temporal former (DASTFormer) and additional supervised training. The DASTFormer contains attention-adaptive (AtA) and pure-adaptive (PuA) modes, which will enhance pose inference from 2D to 3D by adaptively learning spatial-temporal effects, considering both their cooperative and independent influences. In addition, an additional supervised training with batch variance loss is proposed in this work. Different from common training strategy, a two-round parameter update is conducted on the same batch data. Not only can it better explore the potential relationship between spatial-temporal encoding and 3D poses, but it can also alleviate the batch size limitations imposed by graphics cards on transformer-based frameworks. Extensive experimental results show that the proposed method significantly outperforms most state-of-the-art approaches on Human3.6 and HumanEVA datasets.</description><identifier>ISSN: 1424-8220</identifier><identifier>EISSN: 1424-8220</identifier><identifier>DOI: 10.3390/s24134422</identifier><identifier>PMID: 39001202</identifier><language>eng</language><publisher>Switzerland: MDPI AG</publisher><subject>3D human pose estimation ; Adaptation ; Algorithms ; batch variance loss ; Computer vision ; Deep learning ; Design ; dual-adaptive spatial-temporal model ; Human mechanics ; Humans ; Imaging, Three-Dimensional - methods ; Localization ; one-more supervised training ; Posture - physiology ; Robotics - methods</subject><ispartof>Sensors (Basel, Switzerland), 2024-07, Vol.24 (13), p.4422</ispartof><rights>2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c304t-298cae8b18adc0955de02a144b28665eeaf9c09a7bcba0119f144c8edaf1608f3</cites><orcidid>0009-0009-0584-4177 ; 0000-0003-2239-1121</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/3079242415/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/3079242415?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,25753,27924,27925,37012,37013,44590,75126</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/39001202$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Wang, Hexin</creatorcontrib><creatorcontrib>Quan, Wei</creatorcontrib><creatorcontrib>Zhao, Runjing</creatorcontrib><creatorcontrib>Zhang, Miaomiao</creatorcontrib><creatorcontrib>Jiang, Na</creatorcontrib><title>Learning Temporal-Spatial Contextual Adaptation for Three-Dimensional Human Pose Estimation</title><title>Sensors (Basel, Switzerland)</title><addtitle>Sensors (Basel)</addtitle><description>Three-dimensional human pose estimation focuses on generating 3D pose sequences from 2D videos. It has enormous potential in the fields of human-robot interaction, remote sensing, virtual reality, and computer vision. Existing excellent methods primarily focus on exploring spatial or temporal encoding to achieve 3D pose inference. However, various architectures exploit the independent effects of spatial and temporal cues on 3D pose estimation, while neglecting the spatial-temporal synergistic influence. To address this issue, this paper proposes a novel 3D pose estimation method with a dual-adaptive spatial-temporal former (DASTFormer) and additional supervised training. The DASTFormer contains attention-adaptive (AtA) and pure-adaptive (PuA) modes, which will enhance pose inference from 2D to 3D by adaptively learning spatial-temporal effects, considering both their cooperative and independent influences. In addition, an additional supervised training with batch variance loss is proposed in this work. Different from common training strategy, a two-round parameter update is conducted on the same batch data. Not only can it better explore the potential relationship between spatial-temporal encoding and 3D poses, but it can also alleviate the batch size limitations imposed by graphics cards on transformer-based frameworks. Extensive experimental results show that the proposed method significantly outperforms most state-of-the-art approaches on Human3.6 and HumanEVA datasets.</description><subject>3D human pose estimation</subject><subject>Adaptation</subject><subject>Algorithms</subject><subject>batch variance loss</subject><subject>Computer vision</subject><subject>Deep learning</subject><subject>Design</subject><subject>dual-adaptive spatial-temporal model</subject><subject>Human mechanics</subject><subject>Humans</subject><subject>Imaging, Three-Dimensional - methods</subject><subject>Localization</subject><subject>one-more supervised training</subject><subject>Posture - physiology</subject><subject>Robotics - methods</subject><issn>1424-8220</issn><issn>1424-8220</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNpdkU9P3DAQxS1UBBR66BdAkXppDynjP0mcI1poQVoJJJZTD9bEGdOskjjYidR--xqWrqqePHr--Y31HmMfOXyVsoaLKBSXSglxwE64EirXQsC7f-Zj9j7GLYCQUuojdpweARcgTtiPNWEYu_Ep29Aw-YB9_jDh3GGfrfw40695SeNli9OcVD9mzods8zMQ5VfdQGNMWgJulgHH7N5Hyq7j3A2v7Bk7dNhH-vB2nrLHb9eb1U2-vvt-u7pc51aCmnNRa4ukG66xtVAXRUsgkCvVCF2WBRG6OulYNbZB4Lx26c5qatHxErSTp-x259t63JoppPXht_HYmVfBhyeDYe5sT8bWyBsCqKoi5VVi45KJ4ELa1inOq-T1eec1Bf-8UJzN0EVLfY8j-SUaCVWtS1AlJPTTf-jWLyGlsaNECp8Xifqyo2zwMQZy-w9yMC_tmX17iT1_c1yagdo9-bcu-QckV5OM</recordid><startdate>20240708</startdate><enddate>20240708</enddate><creator>Wang, Hexin</creator><creator>Quan, Wei</creator><creator>Zhao, Runjing</creator><creator>Zhang, Miaomiao</creator><creator>Jiang, Na</creator><general>MDPI AG</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>K9.</scope><scope>M0S</scope><scope>M1P</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>7X8</scope><scope>DOA</scope><orcidid>https://orcid.org/0009-0009-0584-4177</orcidid><orcidid>https://orcid.org/0000-0003-2239-1121</orcidid></search><sort><creationdate>20240708</creationdate><title>Learning Temporal-Spatial Contextual Adaptation for Three-Dimensional Human Pose Estimation</title><author>Wang, Hexin ; Quan, Wei ; Zhao, Runjing ; Zhang, Miaomiao ; Jiang, Na</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c304t-298cae8b18adc0955de02a144b28665eeaf9c09a7bcba0119f144c8edaf1608f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>3D human pose estimation</topic><topic>Adaptation</topic><topic>Algorithms</topic><topic>batch variance loss</topic><topic>Computer vision</topic><topic>Deep learning</topic><topic>Design</topic><topic>dual-adaptive spatial-temporal model</topic><topic>Human mechanics</topic><topic>Humans</topic><topic>Imaging, Three-Dimensional - methods</topic><topic>Localization</topic><topic>one-more supervised training</topic><topic>Posture - physiology</topic><topic>Robotics - methods</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wang, Hexin</creatorcontrib><creatorcontrib>Quan, Wei</creatorcontrib><creatorcontrib>Zhao, Runjing</creatorcontrib><creatorcontrib>Zhang, Miaomiao</creatorcontrib><creatorcontrib>Jiang, Na</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>ProQuest Health and Medical</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>MEDLINE - Academic</collection><collection>Directory of Open Access Journals (DOAJ)</collection><jtitle>Sensors (Basel, Switzerland)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wang, Hexin</au><au>Quan, Wei</au><au>Zhao, Runjing</au><au>Zhang, Miaomiao</au><au>Jiang, Na</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Learning Temporal-Spatial Contextual Adaptation for Three-Dimensional Human Pose Estimation</atitle><jtitle>Sensors (Basel, Switzerland)</jtitle><addtitle>Sensors (Basel)</addtitle><date>2024-07-08</date><risdate>2024</risdate><volume>24</volume><issue>13</issue><spage>4422</spage><pages>4422-</pages><issn>1424-8220</issn><eissn>1424-8220</eissn><abstract>Three-dimensional human pose estimation focuses on generating 3D pose sequences from 2D videos. It has enormous potential in the fields of human-robot interaction, remote sensing, virtual reality, and computer vision. Existing excellent methods primarily focus on exploring spatial or temporal encoding to achieve 3D pose inference. However, various architectures exploit the independent effects of spatial and temporal cues on 3D pose estimation, while neglecting the spatial-temporal synergistic influence. To address this issue, this paper proposes a novel 3D pose estimation method with a dual-adaptive spatial-temporal former (DASTFormer) and additional supervised training. The DASTFormer contains attention-adaptive (AtA) and pure-adaptive (PuA) modes, which will enhance pose inference from 2D to 3D by adaptively learning spatial-temporal effects, considering both their cooperative and independent influences. In addition, an additional supervised training with batch variance loss is proposed in this work. Different from common training strategy, a two-round parameter update is conducted on the same batch data. Not only can it better explore the potential relationship between spatial-temporal encoding and 3D poses, but it can also alleviate the batch size limitations imposed by graphics cards on transformer-based frameworks. Extensive experimental results show that the proposed method significantly outperforms most state-of-the-art approaches on Human3.6 and HumanEVA datasets.</abstract><cop>Switzerland</cop><pub>MDPI AG</pub><pmid>39001202</pmid><doi>10.3390/s24134422</doi><orcidid>https://orcid.org/0009-0009-0584-4177</orcidid><orcidid>https://orcid.org/0000-0003-2239-1121</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1424-8220
ispartof Sensors (Basel, Switzerland), 2024-07, Vol.24 (13), p.4422
issn 1424-8220
1424-8220
language eng
recordid cdi_doaj_primary_oai_doaj_org_article_c9a1be007754426abfaf12123cdf4117
source Publicly Available Content Database; PubMed Central
subjects 3D human pose estimation
Adaptation
Algorithms
batch variance loss
Computer vision
Deep learning
Design
dual-adaptive spatial-temporal model
Human mechanics
Humans
Imaging, Three-Dimensional - methods
Localization
one-more supervised training
Posture - physiology
Robotics - methods
title Learning Temporal-Spatial Contextual Adaptation for Three-Dimensional Human Pose Estimation
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T23%3A07%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Learning%20Temporal-Spatial%20Contextual%20Adaptation%20for%20Three-Dimensional%20Human%20Pose%20Estimation&rft.jtitle=Sensors%20(Basel,%20Switzerland)&rft.au=Wang,%20Hexin&rft.date=2024-07-08&rft.volume=24&rft.issue=13&rft.spage=4422&rft.pages=4422-&rft.issn=1424-8220&rft.eissn=1424-8220&rft_id=info:doi/10.3390/s24134422&rft_dat=%3Cproquest_doaj_%3E3079242415%3C/proquest_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c304t-298cae8b18adc0955de02a144b28665eeaf9c09a7bcba0119f144c8edaf1608f3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3079242415&rft_id=info:pmid/39001202&rfr_iscdi=true