Loading…
Learning Temporal-Spatial Contextual Adaptation for Three-Dimensional Human Pose Estimation
Three-dimensional human pose estimation focuses on generating 3D pose sequences from 2D videos. It has enormous potential in the fields of human-robot interaction, remote sensing, virtual reality, and computer vision. Existing excellent methods primarily focus on exploring spatial or temporal encodi...
Saved in:
Published in: | Sensors (Basel, Switzerland) Switzerland), 2024-07, Vol.24 (13), p.4422 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | cdi_FETCH-LOGICAL-c304t-298cae8b18adc0955de02a144b28665eeaf9c09a7bcba0119f144c8edaf1608f3 |
container_end_page | |
container_issue | 13 |
container_start_page | 4422 |
container_title | Sensors (Basel, Switzerland) |
container_volume | 24 |
creator | Wang, Hexin Quan, Wei Zhao, Runjing Zhang, Miaomiao Jiang, Na |
description | Three-dimensional human pose estimation focuses on generating 3D pose sequences from 2D videos. It has enormous potential in the fields of human-robot interaction, remote sensing, virtual reality, and computer vision. Existing excellent methods primarily focus on exploring spatial or temporal encoding to achieve 3D pose inference. However, various architectures exploit the independent effects of spatial and temporal cues on 3D pose estimation, while neglecting the spatial-temporal synergistic influence. To address this issue, this paper proposes a novel 3D pose estimation method with a dual-adaptive spatial-temporal former (DASTFormer) and additional supervised training. The DASTFormer contains attention-adaptive (AtA) and pure-adaptive (PuA) modes, which will enhance pose inference from 2D to 3D by adaptively learning spatial-temporal effects, considering both their cooperative and independent influences. In addition, an additional supervised training with batch variance loss is proposed in this work. Different from common training strategy, a two-round parameter update is conducted on the same batch data. Not only can it better explore the potential relationship between spatial-temporal encoding and 3D poses, but it can also alleviate the batch size limitations imposed by graphics cards on transformer-based frameworks. Extensive experimental results show that the proposed method significantly outperforms most state-of-the-art approaches on Human3.6 and HumanEVA datasets. |
doi_str_mv | 10.3390/s24134422 |
format | article |
fullrecord | <record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_c9a1be007754426abfaf12123cdf4117</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><doaj_id>oai_doaj_org_article_c9a1be007754426abfaf12123cdf4117</doaj_id><sourcerecordid>3079242415</sourcerecordid><originalsourceid>FETCH-LOGICAL-c304t-298cae8b18adc0955de02a144b28665eeaf9c09a7bcba0119f144c8edaf1608f3</originalsourceid><addsrcrecordid>eNpdkU9P3DAQxS1UBBR66BdAkXppDynjP0mcI1poQVoJJJZTD9bEGdOskjjYidR--xqWrqqePHr--Y31HmMfOXyVsoaLKBSXSglxwE64EirXQsC7f-Zj9j7GLYCQUuojdpweARcgTtiPNWEYu_Ep29Aw-YB9_jDh3GGfrfw40695SeNli9OcVD9mzods8zMQ5VfdQGNMWgJulgHH7N5Hyq7j3A2v7Bk7dNhH-vB2nrLHb9eb1U2-vvt-u7pc51aCmnNRa4ukG66xtVAXRUsgkCvVCF2WBRG6OulYNbZB4Lx26c5qatHxErSTp-x259t63JoppPXht_HYmVfBhyeDYe5sT8bWyBsCqKoi5VVi45KJ4ELa1inOq-T1eec1Bf-8UJzN0EVLfY8j-SUaCVWtS1AlJPTTf-jWLyGlsaNECp8Xifqyo2zwMQZy-w9yMC_tmX17iT1_c1yagdo9-bcu-QckV5OM</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3079242415</pqid></control><display><type>article</type><title>Learning Temporal-Spatial Contextual Adaptation for Three-Dimensional Human Pose Estimation</title><source>Publicly Available Content Database</source><source>PubMed Central</source><creator>Wang, Hexin ; Quan, Wei ; Zhao, Runjing ; Zhang, Miaomiao ; Jiang, Na</creator><creatorcontrib>Wang, Hexin ; Quan, Wei ; Zhao, Runjing ; Zhang, Miaomiao ; Jiang, Na</creatorcontrib><description>Three-dimensional human pose estimation focuses on generating 3D pose sequences from 2D videos. It has enormous potential in the fields of human-robot interaction, remote sensing, virtual reality, and computer vision. Existing excellent methods primarily focus on exploring spatial or temporal encoding to achieve 3D pose inference. However, various architectures exploit the independent effects of spatial and temporal cues on 3D pose estimation, while neglecting the spatial-temporal synergistic influence. To address this issue, this paper proposes a novel 3D pose estimation method with a dual-adaptive spatial-temporal former (DASTFormer) and additional supervised training. The DASTFormer contains attention-adaptive (AtA) and pure-adaptive (PuA) modes, which will enhance pose inference from 2D to 3D by adaptively learning spatial-temporal effects, considering both their cooperative and independent influences. In addition, an additional supervised training with batch variance loss is proposed in this work. Different from common training strategy, a two-round parameter update is conducted on the same batch data. Not only can it better explore the potential relationship between spatial-temporal encoding and 3D poses, but it can also alleviate the batch size limitations imposed by graphics cards on transformer-based frameworks. Extensive experimental results show that the proposed method significantly outperforms most state-of-the-art approaches on Human3.6 and HumanEVA datasets.</description><identifier>ISSN: 1424-8220</identifier><identifier>EISSN: 1424-8220</identifier><identifier>DOI: 10.3390/s24134422</identifier><identifier>PMID: 39001202</identifier><language>eng</language><publisher>Switzerland: MDPI AG</publisher><subject>3D human pose estimation ; Adaptation ; Algorithms ; batch variance loss ; Computer vision ; Deep learning ; Design ; dual-adaptive spatial-temporal model ; Human mechanics ; Humans ; Imaging, Three-Dimensional - methods ; Localization ; one-more supervised training ; Posture - physiology ; Robotics - methods</subject><ispartof>Sensors (Basel, Switzerland), 2024-07, Vol.24 (13), p.4422</ispartof><rights>2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c304t-298cae8b18adc0955de02a144b28665eeaf9c09a7bcba0119f144c8edaf1608f3</cites><orcidid>0009-0009-0584-4177 ; 0000-0003-2239-1121</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/3079242415/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/3079242415?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,25753,27924,27925,37012,37013,44590,75126</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/39001202$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Wang, Hexin</creatorcontrib><creatorcontrib>Quan, Wei</creatorcontrib><creatorcontrib>Zhao, Runjing</creatorcontrib><creatorcontrib>Zhang, Miaomiao</creatorcontrib><creatorcontrib>Jiang, Na</creatorcontrib><title>Learning Temporal-Spatial Contextual Adaptation for Three-Dimensional Human Pose Estimation</title><title>Sensors (Basel, Switzerland)</title><addtitle>Sensors (Basel)</addtitle><description>Three-dimensional human pose estimation focuses on generating 3D pose sequences from 2D videos. It has enormous potential in the fields of human-robot interaction, remote sensing, virtual reality, and computer vision. Existing excellent methods primarily focus on exploring spatial or temporal encoding to achieve 3D pose inference. However, various architectures exploit the independent effects of spatial and temporal cues on 3D pose estimation, while neglecting the spatial-temporal synergistic influence. To address this issue, this paper proposes a novel 3D pose estimation method with a dual-adaptive spatial-temporal former (DASTFormer) and additional supervised training. The DASTFormer contains attention-adaptive (AtA) and pure-adaptive (PuA) modes, which will enhance pose inference from 2D to 3D by adaptively learning spatial-temporal effects, considering both their cooperative and independent influences. In addition, an additional supervised training with batch variance loss is proposed in this work. Different from common training strategy, a two-round parameter update is conducted on the same batch data. Not only can it better explore the potential relationship between spatial-temporal encoding and 3D poses, but it can also alleviate the batch size limitations imposed by graphics cards on transformer-based frameworks. Extensive experimental results show that the proposed method significantly outperforms most state-of-the-art approaches on Human3.6 and HumanEVA datasets.</description><subject>3D human pose estimation</subject><subject>Adaptation</subject><subject>Algorithms</subject><subject>batch variance loss</subject><subject>Computer vision</subject><subject>Deep learning</subject><subject>Design</subject><subject>dual-adaptive spatial-temporal model</subject><subject>Human mechanics</subject><subject>Humans</subject><subject>Imaging, Three-Dimensional - methods</subject><subject>Localization</subject><subject>one-more supervised training</subject><subject>Posture - physiology</subject><subject>Robotics - methods</subject><issn>1424-8220</issn><issn>1424-8220</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNpdkU9P3DAQxS1UBBR66BdAkXppDynjP0mcI1poQVoJJJZTD9bEGdOskjjYidR--xqWrqqePHr--Y31HmMfOXyVsoaLKBSXSglxwE64EirXQsC7f-Zj9j7GLYCQUuojdpweARcgTtiPNWEYu_Ep29Aw-YB9_jDh3GGfrfw40695SeNli9OcVD9mzods8zMQ5VfdQGNMWgJulgHH7N5Hyq7j3A2v7Bk7dNhH-vB2nrLHb9eb1U2-vvt-u7pc51aCmnNRa4ukG66xtVAXRUsgkCvVCF2WBRG6OulYNbZB4Lx26c5qatHxErSTp-x259t63JoppPXht_HYmVfBhyeDYe5sT8bWyBsCqKoi5VVi45KJ4ELa1inOq-T1eec1Bf-8UJzN0EVLfY8j-SUaCVWtS1AlJPTTf-jWLyGlsaNECp8Xifqyo2zwMQZy-w9yMC_tmX17iT1_c1yagdo9-bcu-QckV5OM</recordid><startdate>20240708</startdate><enddate>20240708</enddate><creator>Wang, Hexin</creator><creator>Quan, Wei</creator><creator>Zhao, Runjing</creator><creator>Zhang, Miaomiao</creator><creator>Jiang, Na</creator><general>MDPI AG</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>K9.</scope><scope>M0S</scope><scope>M1P</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>7X8</scope><scope>DOA</scope><orcidid>https://orcid.org/0009-0009-0584-4177</orcidid><orcidid>https://orcid.org/0000-0003-2239-1121</orcidid></search><sort><creationdate>20240708</creationdate><title>Learning Temporal-Spatial Contextual Adaptation for Three-Dimensional Human Pose Estimation</title><author>Wang, Hexin ; Quan, Wei ; Zhao, Runjing ; Zhang, Miaomiao ; Jiang, Na</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c304t-298cae8b18adc0955de02a144b28665eeaf9c09a7bcba0119f144c8edaf1608f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>3D human pose estimation</topic><topic>Adaptation</topic><topic>Algorithms</topic><topic>batch variance loss</topic><topic>Computer vision</topic><topic>Deep learning</topic><topic>Design</topic><topic>dual-adaptive spatial-temporal model</topic><topic>Human mechanics</topic><topic>Humans</topic><topic>Imaging, Three-Dimensional - methods</topic><topic>Localization</topic><topic>one-more supervised training</topic><topic>Posture - physiology</topic><topic>Robotics - methods</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wang, Hexin</creatorcontrib><creatorcontrib>Quan, Wei</creatorcontrib><creatorcontrib>Zhao, Runjing</creatorcontrib><creatorcontrib>Zhang, Miaomiao</creatorcontrib><creatorcontrib>Jiang, Na</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>ProQuest Health and Medical</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>MEDLINE - Academic</collection><collection>Directory of Open Access Journals (DOAJ)</collection><jtitle>Sensors (Basel, Switzerland)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wang, Hexin</au><au>Quan, Wei</au><au>Zhao, Runjing</au><au>Zhang, Miaomiao</au><au>Jiang, Na</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Learning Temporal-Spatial Contextual Adaptation for Three-Dimensional Human Pose Estimation</atitle><jtitle>Sensors (Basel, Switzerland)</jtitle><addtitle>Sensors (Basel)</addtitle><date>2024-07-08</date><risdate>2024</risdate><volume>24</volume><issue>13</issue><spage>4422</spage><pages>4422-</pages><issn>1424-8220</issn><eissn>1424-8220</eissn><abstract>Three-dimensional human pose estimation focuses on generating 3D pose sequences from 2D videos. It has enormous potential in the fields of human-robot interaction, remote sensing, virtual reality, and computer vision. Existing excellent methods primarily focus on exploring spatial or temporal encoding to achieve 3D pose inference. However, various architectures exploit the independent effects of spatial and temporal cues on 3D pose estimation, while neglecting the spatial-temporal synergistic influence. To address this issue, this paper proposes a novel 3D pose estimation method with a dual-adaptive spatial-temporal former (DASTFormer) and additional supervised training. The DASTFormer contains attention-adaptive (AtA) and pure-adaptive (PuA) modes, which will enhance pose inference from 2D to 3D by adaptively learning spatial-temporal effects, considering both their cooperative and independent influences. In addition, an additional supervised training with batch variance loss is proposed in this work. Different from common training strategy, a two-round parameter update is conducted on the same batch data. Not only can it better explore the potential relationship between spatial-temporal encoding and 3D poses, but it can also alleviate the batch size limitations imposed by graphics cards on transformer-based frameworks. Extensive experimental results show that the proposed method significantly outperforms most state-of-the-art approaches on Human3.6 and HumanEVA datasets.</abstract><cop>Switzerland</cop><pub>MDPI AG</pub><pmid>39001202</pmid><doi>10.3390/s24134422</doi><orcidid>https://orcid.org/0009-0009-0584-4177</orcidid><orcidid>https://orcid.org/0000-0003-2239-1121</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1424-8220 |
ispartof | Sensors (Basel, Switzerland), 2024-07, Vol.24 (13), p.4422 |
issn | 1424-8220 1424-8220 |
language | eng |
recordid | cdi_doaj_primary_oai_doaj_org_article_c9a1be007754426abfaf12123cdf4117 |
source | Publicly Available Content Database; PubMed Central |
subjects | 3D human pose estimation Adaptation Algorithms batch variance loss Computer vision Deep learning Design dual-adaptive spatial-temporal model Human mechanics Humans Imaging, Three-Dimensional - methods Localization one-more supervised training Posture - physiology Robotics - methods |
title | Learning Temporal-Spatial Contextual Adaptation for Three-Dimensional Human Pose Estimation |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T23%3A07%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Learning%20Temporal-Spatial%20Contextual%20Adaptation%20for%20Three-Dimensional%20Human%20Pose%20Estimation&rft.jtitle=Sensors%20(Basel,%20Switzerland)&rft.au=Wang,%20Hexin&rft.date=2024-07-08&rft.volume=24&rft.issue=13&rft.spage=4422&rft.pages=4422-&rft.issn=1424-8220&rft.eissn=1424-8220&rft_id=info:doi/10.3390/s24134422&rft_dat=%3Cproquest_doaj_%3E3079242415%3C/proquest_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c304t-298cae8b18adc0955de02a144b28665eeaf9c09a7bcba0119f144c8edaf1608f3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3079242415&rft_id=info:pmid/39001202&rfr_iscdi=true |