Loading…

Two-stream adaptive-attentional subgraph convolution networks for skeleton-based action recognition

Recently, skeleton-based action recognition has modeled the human skeleton as a graph convolution network (GCN), and has achieved remarkable results. However, most of the methods convolute directly on the whole graph, neglecting that the human skeleton is made up of multiple body parts, which cannot...

Full description

Saved in:

Bibliographic Details
Published in:	Multimedia tools and applications 2022-02, Vol.81 (4), p.4821-4838
Main Authors:	Li, Xianshan, Meng, Fengchan, Zhao, Fengda, Guo, Dingding, Lou, Fengwei, Jing, Rong
Format:	Article
Language:	English
Subjects:	1193: Intelligent Processing of Multimedia Signals Activity recognition Body parts Bones Computer Communication Networks Computer Science Convolution Data Structures and Information Theory Datasets Domains Feature extraction Graph theory Graphs Multimedia Multimedia Information Systems Special Purpose and Application-Based Systems
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Recently, skeleton-based action recognition has modeled the human skeleton as a graph convolution network (GCN), and has achieved remarkable results. However, most of the methods convolute directly on the whole graph, neglecting that the human skeleton is made up of multiple body parts, which cannot accomplish the task well. We recognize that the physical property of bones (i.e., length and direction) can provide identifiable information which helps effectively to build the multi-level network structure. As the existing methods treat the channel domain and the spatial domain with equal importance, many computing resources are wasted on neglectable features. In our paper, we modify the Convolution Block Attention Module (CBAM) and apply it to the adaptive network. By capturing the implicit weighted information in the channel domain and spatial domain, the network can focus more attention on the key channels and nodes. A new two-stream adaptive-attentional subgraph convolution network (2s-AASGCN) is proposed to extract features in the spatio-temporal domain. We validate 2s-AASGCN on two skeleton datasets, i.e., NTU-RGB+D60 and NTU-RGB+D120. Our model achieves excellent results on these two datasets.
ISSN:	1380-7501 1573-7721
DOI:	10.1007/s11042-021-11026-4