Loading…

SHPR-Net: Deep Semantic Hand Pose Regression From Point Clouds

3-D hand pose estimation is an essential problem for human-computer interaction. Most of the existing depth-based hand pose estimation methods consume 2-D depth map or 3-D volume via 2-D/3-D convolutional neural networks. In this paper, we propose a deep semantic hand pose regression network (SHPR-N...

Full description

Saved in:
Bibliographic Details
Published in:IEEE access 2018-01, Vol.6, p.43425-43439
Main Authors: Chen, Xinghao, Wang, Guijin, Zhang, Cairong, Kim, Tae-Kyun, Ji, Xiangyang
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c408t-dce81915a360df2fa624335648b3b5eddb968e8adf52072496e6e6ac05867d093
cites cdi_FETCH-LOGICAL-c408t-dce81915a360df2fa624335648b3b5eddb968e8adf52072496e6e6ac05867d093
container_end_page 43439
container_issue
container_start_page 43425
container_title IEEE access
container_volume 6
creator Chen, Xinghao
Wang, Guijin
Zhang, Cairong
Kim, Tae-Kyun
Ji, Xiangyang
description 3-D hand pose estimation is an essential problem for human-computer interaction. Most of the existing depth-based hand pose estimation methods consume 2-D depth map or 3-D volume via 2-D/3-D convolutional neural networks. In this paper, we propose a deep semantic hand pose regression network (SHPR-Net) for hand pose estimation from point sets, which consists of two subnetworks: a semantic segmentation subnetwork and a hand pose regression subnetwork. The semantic segmentation network assigns semantic labels for each point in the point set. The pose regression network integrates the semantic priors with both input and late fusion strategy and regresses the final hand pose. Two transformation matrices are learned from the point set and applied to transform the input point cloud and inversely transform the output pose, respectively, which makes the SHPR-Net more robust to geometric transformations. Experiments on NYU, ICVL, and MSRA hand pose data sets demonstrate that our SHPR-Net achieves high performance on par with the start-of-the-art methods. We also show that our method can be naturally extended to hand pose estimation from the multi-view depth data and achieves further improvement on the NYU data set.
doi_str_mv 10.1109/ACCESS.2018.2863540
format article
fullrecord <record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_1432db650c854ce3a655736e31e33718</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8425735</ieee_id><doaj_id>oai_doaj_org_article_1432db650c854ce3a655736e31e33718</doaj_id><sourcerecordid>2455908115</sourcerecordid><originalsourceid>FETCH-LOGICAL-c408t-dce81915a360df2fa624335648b3b5eddb968e8adf52072496e6e6ac05867d093</originalsourceid><addsrcrecordid>eNpNkF9rwjAUxcvYYOL8BL4U9lyX_033MJBOpyCb2O05pM2tVLRxSX3Yt19cRZY8JBzuOffwi6IxRhOMUfY0zfNZUUwIwnJCpKCcoZtoQLDIEsqpuP33v49G3u9QODJIPB1EL8VivUneoXuOXwGOcQEH3XZNFS90a-K19RBvYOvA-8a28dzZQxCbtovzvT0Z_xDd1XrvYXR5h9HXfPaZL5LVx9syn66SiiHZJaYCiTPMNRXI1KTWgjBKuWCypCUHY8pMSJDa1JyglLBMQLi6QlyK1KCMDqNln2us3qmjaw7a_SirG_UnWLdV2oXae1CYUWJKwVElOauAasF5SgVQDJSmWIasxz7r6Oz3CXyndvbk2lBfEcZ5FthgHqZoP1U5672D-roVI3Xmrnru6sxdXbgH17h3NQBwdUhGQgVOfwEdi3rG</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2455908115</pqid></control><display><type>article</type><title>SHPR-Net: Deep Semantic Hand Pose Regression From Point Clouds</title><source>IEEE Xplore Open Access Journals</source><creator>Chen, Xinghao ; Wang, Guijin ; Zhang, Cairong ; Kim, Tae-Kyun ; Ji, Xiangyang</creator><creatorcontrib>Chen, Xinghao ; Wang, Guijin ; Zhang, Cairong ; Kim, Tae-Kyun ; Ji, Xiangyang</creatorcontrib><description>3-D hand pose estimation is an essential problem for human-computer interaction. Most of the existing depth-based hand pose estimation methods consume 2-D depth map or 3-D volume via 2-D/3-D convolutional neural networks. In this paper, we propose a deep semantic hand pose regression network (SHPR-Net) for hand pose estimation from point sets, which consists of two subnetworks: a semantic segmentation subnetwork and a hand pose regression subnetwork. The semantic segmentation network assigns semantic labels for each point in the point set. The pose regression network integrates the semantic priors with both input and late fusion strategy and regresses the final hand pose. Two transformation matrices are learned from the point set and applied to transform the input point cloud and inversely transform the output pose, respectively, which makes the SHPR-Net more robust to geometric transformations. Experiments on NYU, ICVL, and MSRA hand pose data sets demonstrate that our SHPR-Net achieves high performance on par with the start-of-the-art methods. We also show that our method can be naturally extended to hand pose estimation from the multi-view depth data and achieves further improvement on the NYU data set.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2018.2863540</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Artificial neural networks ; Datasets ; deep learning ; Feature extraction ; Geometric transformation ; hand pose estimation ; Human computer interaction ; Machine learning ; point cloud ; Pose estimation ; Regression ; Semantic segmentation ; Semantics ; Three dimensional models ; Three-dimensional displays ; Transforms ; Two dimensional displays</subject><ispartof>IEEE access, 2018-01, Vol.6, p.43425-43439</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c408t-dce81915a360df2fa624335648b3b5eddb968e8adf52072496e6e6ac05867d093</citedby><cites>FETCH-LOGICAL-c408t-dce81915a360df2fa624335648b3b5eddb968e8adf52072496e6e6ac05867d093</cites><orcidid>0000-0002-2131-3044</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8425735$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,780,784,27633,27924,27925,54933</link.rule.ids></links><search><creatorcontrib>Chen, Xinghao</creatorcontrib><creatorcontrib>Wang, Guijin</creatorcontrib><creatorcontrib>Zhang, Cairong</creatorcontrib><creatorcontrib>Kim, Tae-Kyun</creatorcontrib><creatorcontrib>Ji, Xiangyang</creatorcontrib><title>SHPR-Net: Deep Semantic Hand Pose Regression From Point Clouds</title><title>IEEE access</title><addtitle>Access</addtitle><description>3-D hand pose estimation is an essential problem for human-computer interaction. Most of the existing depth-based hand pose estimation methods consume 2-D depth map or 3-D volume via 2-D/3-D convolutional neural networks. In this paper, we propose a deep semantic hand pose regression network (SHPR-Net) for hand pose estimation from point sets, which consists of two subnetworks: a semantic segmentation subnetwork and a hand pose regression subnetwork. The semantic segmentation network assigns semantic labels for each point in the point set. The pose regression network integrates the semantic priors with both input and late fusion strategy and regresses the final hand pose. Two transformation matrices are learned from the point set and applied to transform the input point cloud and inversely transform the output pose, respectively, which makes the SHPR-Net more robust to geometric transformations. Experiments on NYU, ICVL, and MSRA hand pose data sets demonstrate that our SHPR-Net achieves high performance on par with the start-of-the-art methods. We also show that our method can be naturally extended to hand pose estimation from the multi-view depth data and achieves further improvement on the NYU data set.</description><subject>Artificial neural networks</subject><subject>Datasets</subject><subject>deep learning</subject><subject>Feature extraction</subject><subject>Geometric transformation</subject><subject>hand pose estimation</subject><subject>Human computer interaction</subject><subject>Machine learning</subject><subject>point cloud</subject><subject>Pose estimation</subject><subject>Regression</subject><subject>Semantic segmentation</subject><subject>Semantics</subject><subject>Three dimensional models</subject><subject>Three-dimensional displays</subject><subject>Transforms</subject><subject>Two dimensional displays</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>DOA</sourceid><recordid>eNpNkF9rwjAUxcvYYOL8BL4U9lyX_033MJBOpyCb2O05pM2tVLRxSX3Yt19cRZY8JBzuOffwi6IxRhOMUfY0zfNZUUwIwnJCpKCcoZtoQLDIEsqpuP33v49G3u9QODJIPB1EL8VivUneoXuOXwGOcQEH3XZNFS90a-K19RBvYOvA-8a28dzZQxCbtovzvT0Z_xDd1XrvYXR5h9HXfPaZL5LVx9syn66SiiHZJaYCiTPMNRXI1KTWgjBKuWCypCUHY8pMSJDa1JyglLBMQLi6QlyK1KCMDqNln2us3qmjaw7a_SirG_UnWLdV2oXae1CYUWJKwVElOauAasF5SgVQDJSmWIasxz7r6Oz3CXyndvbk2lBfEcZ5FthgHqZoP1U5672D-roVI3Xmrnru6sxdXbgH17h3NQBwdUhGQgVOfwEdi3rG</recordid><startdate>20180101</startdate><enddate>20180101</enddate><creator>Chen, Xinghao</creator><creator>Wang, Guijin</creator><creator>Zhang, Cairong</creator><creator>Kim, Tae-Kyun</creator><creator>Ji, Xiangyang</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-2131-3044</orcidid></search><sort><creationdate>20180101</creationdate><title>SHPR-Net: Deep Semantic Hand Pose Regression From Point Clouds</title><author>Chen, Xinghao ; Wang, Guijin ; Zhang, Cairong ; Kim, Tae-Kyun ; Ji, Xiangyang</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c408t-dce81915a360df2fa624335648b3b5eddb968e8adf52072496e6e6ac05867d093</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Artificial neural networks</topic><topic>Datasets</topic><topic>deep learning</topic><topic>Feature extraction</topic><topic>Geometric transformation</topic><topic>hand pose estimation</topic><topic>Human computer interaction</topic><topic>Machine learning</topic><topic>point cloud</topic><topic>Pose estimation</topic><topic>Regression</topic><topic>Semantic segmentation</topic><topic>Semantics</topic><topic>Three dimensional models</topic><topic>Three-dimensional displays</topic><topic>Transforms</topic><topic>Two dimensional displays</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chen, Xinghao</creatorcontrib><creatorcontrib>Wang, Guijin</creatorcontrib><creatorcontrib>Zhang, Cairong</creatorcontrib><creatorcontrib>Kim, Tae-Kyun</creatorcontrib><creatorcontrib>Ji, Xiangyang</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005–Present</collection><collection>IEEE Xplore Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE/IET Electronic Library</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chen, Xinghao</au><au>Wang, Guijin</au><au>Zhang, Cairong</au><au>Kim, Tae-Kyun</au><au>Ji, Xiangyang</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>SHPR-Net: Deep Semantic Hand Pose Regression From Point Clouds</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2018-01-01</date><risdate>2018</risdate><volume>6</volume><spage>43425</spage><epage>43439</epage><pages>43425-43439</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>3-D hand pose estimation is an essential problem for human-computer interaction. Most of the existing depth-based hand pose estimation methods consume 2-D depth map or 3-D volume via 2-D/3-D convolutional neural networks. In this paper, we propose a deep semantic hand pose regression network (SHPR-Net) for hand pose estimation from point sets, which consists of two subnetworks: a semantic segmentation subnetwork and a hand pose regression subnetwork. The semantic segmentation network assigns semantic labels for each point in the point set. The pose regression network integrates the semantic priors with both input and late fusion strategy and regresses the final hand pose. Two transformation matrices are learned from the point set and applied to transform the input point cloud and inversely transform the output pose, respectively, which makes the SHPR-Net more robust to geometric transformations. Experiments on NYU, ICVL, and MSRA hand pose data sets demonstrate that our SHPR-Net achieves high performance on par with the start-of-the-art methods. We also show that our method can be naturally extended to hand pose estimation from the multi-view depth data and achieves further improvement on the NYU data set.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2018.2863540</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0002-2131-3044</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2169-3536
ispartof IEEE access, 2018-01, Vol.6, p.43425-43439
issn 2169-3536
2169-3536
language eng
recordid cdi_doaj_primary_oai_doaj_org_article_1432db650c854ce3a655736e31e33718
source IEEE Xplore Open Access Journals
subjects Artificial neural networks
Datasets
deep learning
Feature extraction
Geometric transformation
hand pose estimation
Human computer interaction
Machine learning
point cloud
Pose estimation
Regression
Semantic segmentation
Semantics
Three dimensional models
Three-dimensional displays
Transforms
Two dimensional displays
title SHPR-Net: Deep Semantic Hand Pose Regression From Point Clouds
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T20%3A56%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=SHPR-Net:%20Deep%20Semantic%20Hand%20Pose%20Regression%20From%20Point%20Clouds&rft.jtitle=IEEE%20access&rft.au=Chen,%20Xinghao&rft.date=2018-01-01&rft.volume=6&rft.spage=43425&rft.epage=43439&rft.pages=43425-43439&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2018.2863540&rft_dat=%3Cproquest_doaj_%3E2455908115%3C/proquest_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c408t-dce81915a360df2fa624335648b3b5eddb968e8adf52072496e6e6ac05867d093%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2455908115&rft_id=info:pmid/&rft_ieee_id=8425735&rfr_iscdi=true