Loading…

Skeleton-Based Action Recognition with Spatial-Structural Graph Convolution

Human Activity Recognition (HAR) is a field of study that focuses on identifying and classifying human activities. Skeleton-based Human Activity Recognition has received much attention in recent years, where Graph Convolutional Network (GCN) based method is widely used and has achieved remarkable re...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2024-07
Main Authors: Wang, Jingyao, Bergeret, Emmanuel, Falih, Issam
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Human Activity Recognition (HAR) is a field of study that focuses on identifying and classifying human activities. Skeleton-based Human Activity Recognition has received much attention in recent years, where Graph Convolutional Network (GCN) based method is widely used and has achieved remarkable results. However, the representation of skeleton data and the issue of over-smoothing in GCN still need to be studied. 1). Compared to central nodes, edge nodes can only aggregate limited neighbor information, and different edge nodes of the human body are always structurally related. However, the information from edge nodes is crucial for fine-grained activity recognition. 2). The Graph Convolutional Network suffers from a significant over-smoothing issue, causing nodes to become increasingly similar as the number of network layers increases. Based on these two ideas, we propose a two-stream graph convolution method called Spatial-Structural GCN (SpSt-GCN). Spatial GCN performs information aggregation based on the topological structure of the human body, and structural GCN performs differentiation based on the similarity of edge node sequences. The spatial connection is fixed, and the human skeleton naturally maintains this topology regardless of the actions performed by humans. However, the structural connection is dynamic and depends on the type of movement the human body is performing. Based on this idea, we also propose an entirely data-driven structural connection, which greatly increases flexibility. We evaluate our method on two large-scale datasets, i.e., NTU RGB+D and NTU RGB+D 120. The proposed method achieves good results while being efficient.
ISSN:2331-8422