Loading…

Orientation Cues-Aware Facial Relationship Representation for Head Pose Estimation via Transformer

Head pose estimation (HPE) is an indispensable upstream task in the fields of human-machine interaction, self-driving, and attention detection. However, practical head pose applications suffer from several challenges, such as severe occlusion, low illumination, and extreme orientations. To address t...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on image processing 2023, Vol.32, p.6289-6302
Main Authors:	Liu, Hai, Zhang, Cheng, Deng, Yongjian, Liu, Tingting, Zhang, Zhaoli, Li, You-Fu
Format:	Article
Language:	English
Subjects:	attention mechanism Computer architecture deep learning Head Head pose estimation Occlusion Orientation relationships Pose estimation relationship perception Semantics Task analysis transformer Transformers Visualization
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c367t-519583dd83a4fc9ab4e0d8f0b79bd2965e76f3ff703de2a16bf9570c30176bc93
cites	cdi_FETCH-LOGICAL-c367t-519583dd83a4fc9ab4e0d8f0b79bd2965e76f3ff703de2a16bf9570c30176bc93
container_end_page	6302
container_issue
container_start_page	6289
container_title	IEEE transactions on image processing
container_volume	32
creator	Liu, Hai Zhang, Cheng Deng, Yongjian Liu, Tingting Zhang, Zhaoli Li, You-Fu
description	Head pose estimation (HPE) is an indispensable upstream task in the fields of human-machine interaction, self-driving, and attention detection. However, practical head pose applications suffer from several challenges, such as severe occlusion, low illumination, and extreme orientations. To address these challenges, we identify three cues from head images, namely, critical minority relationships, neighborhood orientation relationships, and significant facial changes. On the basis of the three cues, two key insights on head poses are revealed: 1) intra-orientation relationship and 2) cross-orientation relationship. To leverage two key insights above, a novel relationship-driven method is proposed based on the Transformer architecture, in which facial and orientation relationships can be learned. Specifically, we design several orientation tokens to explicitly encode basic orientation regions. Besides, a novel token guide multi-loss function is accordingly designed to guide the orientation tokens as they learn the desired regional similarities and relationships. Experimental results on three challenging benchmark HPE datasets show that our proposed TokenHPE achieves state-of-the-art performance. Moreover, qualitative visualizations are provided to verify the effectiveness of the token-learning methodology.
doi_str_mv	10.1109/TIP.2023.3331309
format	article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2890364459</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10318055</ieee_id><sourcerecordid>2890364459</sourcerecordid><originalsourceid>FETCH-LOGICAL-c367t-519583dd83a4fc9ab4e0d8f0b79bd2965e76f3ff703de2a16bf9570c30176bc93</originalsourceid><addsrcrecordid>eNpdkEFLw0AQhRdRbK3ePXgIePGSOpvJZrPHUlpbKLRIPYdNMospaVJ3U8V_78aKiKeZ4X0zzHuM3XIYcw7qcbvcjCOIcIyIHEGdsSFXMQ8B4ujc9yBkKHmsBuzKuR0AjwVPLtkApUoQIB2yfG0rajrdVW0TTI_kwsmHthTMdVHpOnim-ltyr9XBDwdL7pc2rQ0WpMtg0zoKZq6r9ifhvdLB1urGeWJP9ppdGF07uvmpI_Yyn22ni3C1flpOJ6uwwER2oeBKpFiWKerYFErnMUGZGsilystIJYJkYtAYCVhSpHmSGyUkFAhcJnmhcMQeTncPtn3zTrpsX7mC6lo31B5dFqUKMIlj0aP3_9Bde7SN_66nIpRCcvQUnKjCts5ZMtnBeo_2M-OQ9flnPv-szz_7yd-v3J1WKiL6gyNPQQj8Ap3vf6M</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2892375713</pqid></control><display><type>article</type><title>Orientation Cues-Aware Facial Relationship Representation for Head Pose Estimation via Transformer</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Liu, Hai ; Zhang, Cheng ; Deng, Yongjian ; Liu, Tingting ; Zhang, Zhaoli ; Li, You-Fu</creator><creatorcontrib>Liu, Hai ; Zhang, Cheng ; Deng, Yongjian ; Liu, Tingting ; Zhang, Zhaoli ; Li, You-Fu</creatorcontrib><description>Head pose estimation (HPE) is an indispensable upstream task in the fields of human-machine interaction, self-driving, and attention detection. However, practical head pose applications suffer from several challenges, such as severe occlusion, low illumination, and extreme orientations. To address these challenges, we identify three cues from head images, namely, critical minority relationships, neighborhood orientation relationships, and significant facial changes. On the basis of the three cues, two key insights on head poses are revealed: 1) intra-orientation relationship and 2) cross-orientation relationship. To leverage two key insights above, a novel relationship-driven method is proposed based on the Transformer architecture, in which facial and orientation relationships can be learned. Specifically, we design several orientation tokens to explicitly encode basic orientation regions. Besides, a novel token guide multi-loss function is accordingly designed to guide the orientation tokens as they learn the desired regional similarities and relationships. Experimental results on three challenging benchmark HPE datasets show that our proposed TokenHPE achieves state-of-the-art performance. Moreover, qualitative visualizations are provided to verify the effectiveness of the token-learning methodology.</description><identifier>ISSN: 1057-7149</identifier><identifier>EISSN: 1941-0042</identifier><identifier>DOI: 10.1109/TIP.2023.3331309</identifier><identifier>PMID: 37963008</identifier><identifier>CODEN: IIPRE4</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>attention mechanism ; Computer architecture ; deep learning ; Head ; Head pose estimation ; Occlusion ; Orientation relationships ; Pose estimation ; relationship perception ; Semantics ; Task analysis ; transformer ; Transformers ; Visualization</subject><ispartof>IEEE transactions on image processing, 2023, Vol.32, p.6289-6302</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c367t-519583dd83a4fc9ab4e0d8f0b79bd2965e76f3ff703de2a16bf9570c30176bc93</citedby><cites>FETCH-LOGICAL-c367t-519583dd83a4fc9ab4e0d8f0b79bd2965e76f3ff703de2a16bf9570c30176bc93</cites><orcidid>0000-0002-9347-5974 ; 0000-0001-6831-5103 ; 0000-0002-5227-1326 ; 0000-0003-3446-9301 ; 0000-0002-0844-0719 ; 0000-0001-6253-3564</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10318055$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,4009,27902,27903,27904,54774</link.rule.ids></links><search><creatorcontrib>Liu, Hai</creatorcontrib><creatorcontrib>Zhang, Cheng</creatorcontrib><creatorcontrib>Deng, Yongjian</creatorcontrib><creatorcontrib>Liu, Tingting</creatorcontrib><creatorcontrib>Zhang, Zhaoli</creatorcontrib><creatorcontrib>Li, You-Fu</creatorcontrib><title>Orientation Cues-Aware Facial Relationship Representation for Head Pose Estimation via Transformer</title><title>IEEE transactions on image processing</title><addtitle>TIP</addtitle><description>Head pose estimation (HPE) is an indispensable upstream task in the fields of human-machine interaction, self-driving, and attention detection. However, practical head pose applications suffer from several challenges, such as severe occlusion, low illumination, and extreme orientations. To address these challenges, we identify three cues from head images, namely, critical minority relationships, neighborhood orientation relationships, and significant facial changes. On the basis of the three cues, two key insights on head poses are revealed: 1) intra-orientation relationship and 2) cross-orientation relationship. To leverage two key insights above, a novel relationship-driven method is proposed based on the Transformer architecture, in which facial and orientation relationships can be learned. Specifically, we design several orientation tokens to explicitly encode basic orientation regions. Besides, a novel token guide multi-loss function is accordingly designed to guide the orientation tokens as they learn the desired regional similarities and relationships. Experimental results on three challenging benchmark HPE datasets show that our proposed TokenHPE achieves state-of-the-art performance. Moreover, qualitative visualizations are provided to verify the effectiveness of the token-learning methodology.</description><subject>attention mechanism</subject><subject>Computer architecture</subject><subject>deep learning</subject><subject>Head</subject><subject>Head pose estimation</subject><subject>Occlusion</subject><subject>Orientation relationships</subject><subject>Pose estimation</subject><subject>relationship perception</subject><subject>Semantics</subject><subject>Task analysis</subject><subject>transformer</subject><subject>Transformers</subject><subject>Visualization</subject><issn>1057-7149</issn><issn>1941-0042</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><recordid>eNpdkEFLw0AQhRdRbK3ePXgIePGSOpvJZrPHUlpbKLRIPYdNMospaVJ3U8V_78aKiKeZ4X0zzHuM3XIYcw7qcbvcjCOIcIyIHEGdsSFXMQ8B4ujc9yBkKHmsBuzKuR0AjwVPLtkApUoQIB2yfG0rajrdVW0TTI_kwsmHthTMdVHpOnim-ltyr9XBDwdL7pc2rQ0WpMtg0zoKZq6r9ifhvdLB1urGeWJP9ppdGF07uvmpI_Yyn22ni3C1flpOJ6uwwER2oeBKpFiWKerYFErnMUGZGsilystIJYJkYtAYCVhSpHmSGyUkFAhcJnmhcMQeTncPtn3zTrpsX7mC6lo31B5dFqUKMIlj0aP3_9Bde7SN_66nIpRCcvQUnKjCts5ZMtnBeo_2M-OQ9flnPv-szz_7yd-v3J1WKiL6gyNPQQj8Ap3vf6M</recordid><startdate>2023</startdate><enddate>2023</enddate><creator>Liu, Hai</creator><creator>Zhang, Cheng</creator><creator>Deng, Yongjian</creator><creator>Liu, Tingting</creator><creator>Zhang, Zhaoli</creator><creator>Li, You-Fu</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-9347-5974</orcidid><orcidid>https://orcid.org/0000-0001-6831-5103</orcidid><orcidid>https://orcid.org/0000-0002-5227-1326</orcidid><orcidid>https://orcid.org/0000-0003-3446-9301</orcidid><orcidid>https://orcid.org/0000-0002-0844-0719</orcidid><orcidid>https://orcid.org/0000-0001-6253-3564</orcidid></search><sort><creationdate>2023</creationdate><title>Orientation Cues-Aware Facial Relationship Representation for Head Pose Estimation via Transformer</title><author>Liu, Hai ; Zhang, Cheng ; Deng, Yongjian ; Liu, Tingting ; Zhang, Zhaoli ; Li, You-Fu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c367t-519583dd83a4fc9ab4e0d8f0b79bd2965e76f3ff703de2a16bf9570c30176bc93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>attention mechanism</topic><topic>Computer architecture</topic><topic>deep learning</topic><topic>Head</topic><topic>Head pose estimation</topic><topic>Occlusion</topic><topic>Orientation relationships</topic><topic>Pose estimation</topic><topic>relationship perception</topic><topic>Semantics</topic><topic>Task analysis</topic><topic>transformer</topic><topic>Transformers</topic><topic>Visualization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liu, Hai</creatorcontrib><creatorcontrib>Zhang, Cheng</creatorcontrib><creatorcontrib>Deng, Yongjian</creatorcontrib><creatorcontrib>Liu, Tingting</creatorcontrib><creatorcontrib>Zhang, Zhaoli</creatorcontrib><creatorcontrib>Li, You-Fu</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Xplore Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on image processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Liu, Hai</au><au>Zhang, Cheng</au><au>Deng, Yongjian</au><au>Liu, Tingting</au><au>Zhang, Zhaoli</au><au>Li, You-Fu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Orientation Cues-Aware Facial Relationship Representation for Head Pose Estimation via Transformer</atitle><jtitle>IEEE transactions on image processing</jtitle><stitle>TIP</stitle><date>2023</date><risdate>2023</risdate><volume>32</volume><spage>6289</spage><epage>6302</epage><pages>6289-6302</pages><issn>1057-7149</issn><eissn>1941-0042</eissn><coden>IIPRE4</coden><abstract>Head pose estimation (HPE) is an indispensable upstream task in the fields of human-machine interaction, self-driving, and attention detection. However, practical head pose applications suffer from several challenges, such as severe occlusion, low illumination, and extreme orientations. To address these challenges, we identify three cues from head images, namely, critical minority relationships, neighborhood orientation relationships, and significant facial changes. On the basis of the three cues, two key insights on head poses are revealed: 1) intra-orientation relationship and 2) cross-orientation relationship. To leverage two key insights above, a novel relationship-driven method is proposed based on the Transformer architecture, in which facial and orientation relationships can be learned. Specifically, we design several orientation tokens to explicitly encode basic orientation regions. Besides, a novel token guide multi-loss function is accordingly designed to guide the orientation tokens as they learn the desired regional similarities and relationships. Experimental results on three challenging benchmark HPE datasets show that our proposed TokenHPE achieves state-of-the-art performance. Moreover, qualitative visualizations are provided to verify the effectiveness of the token-learning methodology.</abstract><cop>New York</cop><pub>IEEE</pub><pmid>37963008</pmid><doi>10.1109/TIP.2023.3331309</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0002-9347-5974</orcidid><orcidid>https://orcid.org/0000-0001-6831-5103</orcidid><orcidid>https://orcid.org/0000-0002-5227-1326</orcidid><orcidid>https://orcid.org/0000-0003-3446-9301</orcidid><orcidid>https://orcid.org/0000-0002-0844-0719</orcidid><orcidid>https://orcid.org/0000-0001-6253-3564</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1057-7149
ispartof	IEEE transactions on image processing, 2023, Vol.32, p.6289-6302
issn	1057-7149 1941-0042
language	eng
recordid	cdi_proquest_miscellaneous_2890364459
source	IEEE Electronic Library (IEL) Journals
subjects	attention mechanism Computer architecture deep learning Head Head pose estimation Occlusion Orientation relationships Pose estimation relationship perception Semantics Task analysis transformer Transformers Visualization
title	Orientation Cues-Aware Facial Relationship Representation for Head Pose Estimation via Transformer
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-25T18%3A47%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Orientation%20Cues-Aware%20Facial%20Relationship%20Representation%20for%20Head%20Pose%20Estimation%20via%20Transformer&rft.jtitle=IEEE%20transactions%20on%20image%20processing&rft.au=Liu,%20Hai&rft.date=2023&rft.volume=32&rft.spage=6289&rft.epage=6302&rft.pages=6289-6302&rft.issn=1057-7149&rft.eissn=1941-0042&rft.coden=IIPRE4&rft_id=info:doi/10.1109/TIP.2023.3331309&rft_dat=%3Cproquest_cross%3E2890364459%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c367t-519583dd83a4fc9ab4e0d8f0b79bd2965e76f3ff703de2a16bf9570c30176bc93%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2892375713&rft_id=info:pmid/37963008&rft_ieee_id=10318055&rfr_iscdi=true