Loading…

Graph2Net: Perceptually-Enriched Graph Learning for Skeleton-Based Action Recognition

Skeleton representation has attracted a great deal of attention recently as an extremely robust feature for human action recognition. However, its non-Euclidean structural characteristics raise new challenges for conventional solutions. Recent studies have shown that there is a native superiority in...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on circuits and systems for video technology 2022-04, Vol.32 (4), p.2120-2132
Main Authors:	Wu, Cong, Wu, Xiao-Jun, Kittler, Josef
Format:	Article
Language:	English
Subjects:	Convolution Feature extraction graph learning Hidden Markov models Human activity recognition Human motion Innovations Learning Representations Skeleton Skeleton-based action recognition Spatial data Spatiotemporal phenomena Task analysis Technological innovation
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c295t-d53413cc2de6f3cc2464702c25a4d8ad991078ca5b13db14983e34285bbef97e3
cites	cdi_FETCH-LOGICAL-c295t-d53413cc2de6f3cc2464702c25a4d8ad991078ca5b13db14983e34285bbef97e3
container_end_page	2132
container_issue	4
container_start_page	2120
container_title	IEEE transactions on circuits and systems for video technology
container_volume	32
creator	Wu, Cong Wu, Xiao-Jun Kittler, Josef
description	Skeleton representation has attracted a great deal of attention recently as an extremely robust feature for human action recognition. However, its non-Euclidean structural characteristics raise new challenges for conventional solutions. Recent studies have shown that there is a native superiority in modeling spatiotemporal skeleton information with a Graph Convolutional Network (GCN). Nevertheless, the skeleton graph modeling normally focuses on the physical adjacency of the elements of the human skeleton sequence, which contrasts with the requirement to provide a perceptually meaningful representation. To address this problem, in this paper, we propose a perceptually-enriched graph learning method by introducing innovative features to spatial and temporal skeleton graph modeling. For the spatial information modeling, we incorporate a Local-Global Graph Convolutional Network (LG-GCN) that builds a multifaceted spatial perceptual representation. This helps to overcome the limitations caused by over-reliance on the spatial adjacency relationships in the skeleton. For temporal modeling, we present a Region-Aware Graph Convolutional Network (RA-GCN), which directly embeds the regional relationships conveyed by a skeleton sequence into a temporal graph model. This innovation mitigates the deficiency of the original skeleton graph models. In addition, we strengthened the ability of the proposed channel modeling methods to extract multi-scale representations. These innovations result in a lightweight graph convolutional model, referred to as Graph2Net, that simultaneously extends the spatial and temporal perceptual fields, and thus enhances the capacity of the graph model to represent skeleton sequences. We conduct extensive experiments on NTU-RGB+D 60&120, Northwestern-UCLA, and Kinetics-400 datasets to show that our results surpass the performance of several mainstream methods while limiting the model complexity and computational overhead.
doi_str_mv	10.1109/TCSVT.2021.3085959
format	article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TCSVT_2021_3085959</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9446181</ieee_id><sourcerecordid>2647425836</sourcerecordid><originalsourceid>FETCH-LOGICAL-c295t-d53413cc2de6f3cc2464702c25a4d8ad991078ca5b13db14983e34285bbef97e3</originalsourceid><addsrcrecordid>eNo9kE1PwkAURSdGExH9A7pp4ro4bz7aGXdIEE2IGgG3k-n0FYq1rdOy4N_bAnH17uLc-5JDyC3QEQDVD8vJ4ms5YpTBiFMltdRnZABSqpAxKs-7TCWEioG8JFdNs6UUhBLxgKxm3tYb9obtY_CB3mHd7mxR7MNp6XO3wTQ4AMEcrS_zch1klQ8W31hgW5Xhk206YuzavCqDT3TVusz7fE0uMls0eHO6Q7J6ni4nL-H8ffY6Gc9Dx7Rsw1RyAdw5lmKU9VdEIqbMMWlFqmyqNdBYOSsT4GkCQiuOXDAlkwQzHSMfkvvjbu2r3x02rdlWO192Lw3rpgSTikcdxY6U81XTeMxM7fMf6_cGqOn1mYM-0-szJ31d6e5YyhHxv6CFiEAB_wMc-2uu</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2647425836</pqid></control><display><type>article</type><title>Graph2Net: Perceptually-Enriched Graph Learning for Skeleton-Based Action Recognition</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Wu, Cong ; Wu, Xiao-Jun ; Kittler, Josef</creator><creatorcontrib>Wu, Cong ; Wu, Xiao-Jun ; Kittler, Josef</creatorcontrib><description>Skeleton representation has attracted a great deal of attention recently as an extremely robust feature for human action recognition. However, its non-Euclidean structural characteristics raise new challenges for conventional solutions. Recent studies have shown that there is a native superiority in modeling spatiotemporal skeleton information with a Graph Convolutional Network (GCN). Nevertheless, the skeleton graph modeling normally focuses on the physical adjacency of the elements of the human skeleton sequence, which contrasts with the requirement to provide a perceptually meaningful representation. To address this problem, in this paper, we propose a perceptually-enriched graph learning method by introducing innovative features to spatial and temporal skeleton graph modeling. For the spatial information modeling, we incorporate a Local-Global Graph Convolutional Network (LG-GCN) that builds a multifaceted spatial perceptual representation. This helps to overcome the limitations caused by over-reliance on the spatial adjacency relationships in the skeleton. For temporal modeling, we present a Region-Aware Graph Convolutional Network (RA-GCN), which directly embeds the regional relationships conveyed by a skeleton sequence into a temporal graph model. This innovation mitigates the deficiency of the original skeleton graph models. In addition, we strengthened the ability of the proposed channel modeling methods to extract multi-scale representations. These innovations result in a lightweight graph convolutional model, referred to as Graph2Net, that simultaneously extends the spatial and temporal perceptual fields, and thus enhances the capacity of the graph model to represent skeleton sequences. We conduct extensive experiments on NTU-RGB+D 60&120, Northwestern-UCLA, and Kinetics-400 datasets to show that our results surpass the performance of several mainstream methods while limiting the model complexity and computational overhead.</description><identifier>ISSN: 1051-8215</identifier><identifier>EISSN: 1558-2205</identifier><identifier>DOI: 10.1109/TCSVT.2021.3085959</identifier><identifier>CODEN: ITCTEM</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Convolution ; Feature extraction ; graph learning ; Hidden Markov models ; Human activity recognition ; Human motion ; Innovations ; Learning ; Representations ; Skeleton ; Skeleton-based action recognition ; Spatial data ; Spatiotemporal phenomena ; Task analysis ; Technological innovation</subject><ispartof>IEEE transactions on circuits and systems for video technology, 2022-04, Vol.32 (4), p.2120-2132</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c295t-d53413cc2de6f3cc2464702c25a4d8ad991078ca5b13db14983e34285bbef97e3</citedby><cites>FETCH-LOGICAL-c295t-d53413cc2de6f3cc2464702c25a4d8ad991078ca5b13db14983e34285bbef97e3</cites><orcidid>0000-0002-8110-9205 ; 0000-0001-9555-9445 ; 0000-0002-0310-5778</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9446181$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,54771</link.rule.ids></links><search><creatorcontrib>Wu, Cong</creatorcontrib><creatorcontrib>Wu, Xiao-Jun</creatorcontrib><creatorcontrib>Kittler, Josef</creatorcontrib><title>Graph2Net: Perceptually-Enriched Graph Learning for Skeleton-Based Action Recognition</title><title>IEEE transactions on circuits and systems for video technology</title><addtitle>TCSVT</addtitle><description>Skeleton representation has attracted a great deal of attention recently as an extremely robust feature for human action recognition. However, its non-Euclidean structural characteristics raise new challenges for conventional solutions. Recent studies have shown that there is a native superiority in modeling spatiotemporal skeleton information with a Graph Convolutional Network (GCN). Nevertheless, the skeleton graph modeling normally focuses on the physical adjacency of the elements of the human skeleton sequence, which contrasts with the requirement to provide a perceptually meaningful representation. To address this problem, in this paper, we propose a perceptually-enriched graph learning method by introducing innovative features to spatial and temporal skeleton graph modeling. For the spatial information modeling, we incorporate a Local-Global Graph Convolutional Network (LG-GCN) that builds a multifaceted spatial perceptual representation. This helps to overcome the limitations caused by over-reliance on the spatial adjacency relationships in the skeleton. For temporal modeling, we present a Region-Aware Graph Convolutional Network (RA-GCN), which directly embeds the regional relationships conveyed by a skeleton sequence into a temporal graph model. This innovation mitigates the deficiency of the original skeleton graph models. In addition, we strengthened the ability of the proposed channel modeling methods to extract multi-scale representations. These innovations result in a lightweight graph convolutional model, referred to as Graph2Net, that simultaneously extends the spatial and temporal perceptual fields, and thus enhances the capacity of the graph model to represent skeleton sequences. We conduct extensive experiments on NTU-RGB+D 60&120, Northwestern-UCLA, and Kinetics-400 datasets to show that our results surpass the performance of several mainstream methods while limiting the model complexity and computational overhead.</description><subject>Convolution</subject><subject>Feature extraction</subject><subject>graph learning</subject><subject>Hidden Markov models</subject><subject>Human activity recognition</subject><subject>Human motion</subject><subject>Innovations</subject><subject>Learning</subject><subject>Representations</subject><subject>Skeleton</subject><subject>Skeleton-based action recognition</subject><subject>Spatial data</subject><subject>Spatiotemporal phenomena</subject><subject>Task analysis</subject><subject>Technological innovation</subject><issn>1051-8215</issn><issn>1558-2205</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNo9kE1PwkAURSdGExH9A7pp4ro4bz7aGXdIEE2IGgG3k-n0FYq1rdOy4N_bAnH17uLc-5JDyC3QEQDVD8vJ4ms5YpTBiFMltdRnZABSqpAxKs-7TCWEioG8JFdNs6UUhBLxgKxm3tYb9obtY_CB3mHd7mxR7MNp6XO3wTQ4AMEcrS_zch1klQ8W31hgW5Xhk206YuzavCqDT3TVusz7fE0uMls0eHO6Q7J6ni4nL-H8ffY6Gc9Dx7Rsw1RyAdw5lmKU9VdEIqbMMWlFqmyqNdBYOSsT4GkCQiuOXDAlkwQzHSMfkvvjbu2r3x02rdlWO192Lw3rpgSTikcdxY6U81XTeMxM7fMf6_cGqOn1mYM-0-szJ31d6e5YyhHxv6CFiEAB_wMc-2uu</recordid><startdate>20220401</startdate><enddate>20220401</enddate><creator>Wu, Cong</creator><creator>Wu, Xiao-Jun</creator><creator>Kittler, Josef</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-8110-9205</orcidid><orcidid>https://orcid.org/0000-0001-9555-9445</orcidid><orcidid>https://orcid.org/0000-0002-0310-5778</orcidid></search><sort><creationdate>20220401</creationdate><title>Graph2Net: Perceptually-Enriched Graph Learning for Skeleton-Based Action Recognition</title><author>Wu, Cong ; Wu, Xiao-Jun ; Kittler, Josef</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c295t-d53413cc2de6f3cc2464702c25a4d8ad991078ca5b13db14983e34285bbef97e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Convolution</topic><topic>Feature extraction</topic><topic>graph learning</topic><topic>Hidden Markov models</topic><topic>Human activity recognition</topic><topic>Human motion</topic><topic>Innovations</topic><topic>Learning</topic><topic>Representations</topic><topic>Skeleton</topic><topic>Skeleton-based action recognition</topic><topic>Spatial data</topic><topic>Spatiotemporal phenomena</topic><topic>Task analysis</topic><topic>Technological innovation</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wu, Cong</creatorcontrib><creatorcontrib>Wu, Xiao-Jun</creatorcontrib><creatorcontrib>Kittler, Josef</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on circuits and systems for video technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wu, Cong</au><au>Wu, Xiao-Jun</au><au>Kittler, Josef</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Graph2Net: Perceptually-Enriched Graph Learning for Skeleton-Based Action Recognition</atitle><jtitle>IEEE transactions on circuits and systems for video technology</jtitle><stitle>TCSVT</stitle><date>2022-04-01</date><risdate>2022</risdate><volume>32</volume><issue>4</issue><spage>2120</spage><epage>2132</epage><pages>2120-2132</pages><issn>1051-8215</issn><eissn>1558-2205</eissn><coden>ITCTEM</coden><abstract>Skeleton representation has attracted a great deal of attention recently as an extremely robust feature for human action recognition. However, its non-Euclidean structural characteristics raise new challenges for conventional solutions. Recent studies have shown that there is a native superiority in modeling spatiotemporal skeleton information with a Graph Convolutional Network (GCN). Nevertheless, the skeleton graph modeling normally focuses on the physical adjacency of the elements of the human skeleton sequence, which contrasts with the requirement to provide a perceptually meaningful representation. To address this problem, in this paper, we propose a perceptually-enriched graph learning method by introducing innovative features to spatial and temporal skeleton graph modeling. For the spatial information modeling, we incorporate a Local-Global Graph Convolutional Network (LG-GCN) that builds a multifaceted spatial perceptual representation. This helps to overcome the limitations caused by over-reliance on the spatial adjacency relationships in the skeleton. For temporal modeling, we present a Region-Aware Graph Convolutional Network (RA-GCN), which directly embeds the regional relationships conveyed by a skeleton sequence into a temporal graph model. This innovation mitigates the deficiency of the original skeleton graph models. In addition, we strengthened the ability of the proposed channel modeling methods to extract multi-scale representations. These innovations result in a lightweight graph convolutional model, referred to as Graph2Net, that simultaneously extends the spatial and temporal perceptual fields, and thus enhances the capacity of the graph model to represent skeleton sequences. We conduct extensive experiments on NTU-RGB+D 60&120, Northwestern-UCLA, and Kinetics-400 datasets to show that our results surpass the performance of several mainstream methods while limiting the model complexity and computational overhead.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TCSVT.2021.3085959</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0002-8110-9205</orcidid><orcidid>https://orcid.org/0000-0001-9555-9445</orcidid><orcidid>https://orcid.org/0000-0002-0310-5778</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 1051-8215
ispartof	IEEE transactions on circuits and systems for video technology, 2022-04, Vol.32 (4), p.2120-2132
issn	1051-8215 1558-2205
language	eng
recordid	cdi_crossref_primary_10_1109_TCSVT_2021_3085959
source	IEEE Electronic Library (IEL) Journals
subjects	Convolution Feature extraction graph learning Hidden Markov models Human activity recognition Human motion Innovations Learning Representations Skeleton Skeleton-based action recognition Spatial data Spatiotemporal phenomena Task analysis Technological innovation
title	Graph2Net: Perceptually-Enriched Graph Learning for Skeleton-Based Action Recognition
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-23T02%3A35%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Graph2Net:%20Perceptually-Enriched%20Graph%20Learning%20for%20Skeleton-Based%20Action%20Recognition&rft.jtitle=IEEE%20transactions%20on%20circuits%20and%20systems%20for%20video%20technology&rft.au=Wu,%20Cong&rft.date=2022-04-01&rft.volume=32&rft.issue=4&rft.spage=2120&rft.epage=2132&rft.pages=2120-2132&rft.issn=1051-8215&rft.eissn=1558-2205&rft.coden=ITCTEM&rft_id=info:doi/10.1109/TCSVT.2021.3085959&rft_dat=%3Cproquest_cross%3E2647425836%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c295t-d53413cc2de6f3cc2464702c25a4d8ad991078ca5b13db14983e34285bbef97e3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2647425836&rft_id=info:pmid/&rft_ieee_id=9446181&rfr_iscdi=true