Loading…

Cross-Modality Compensation Convolutional Neural Networks for RGB-D Action Recognition

RGB-D-based human action recognition has attracted much attention recently because it can provide more complementary information than a single modality. However, it is difficult for two modalities to effectively learn spatial-temporal information from each other. To facilitate information interactio...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on circuits and systems for video technology 2022-03, Vol.32 (3), p.1498-1509
Main Authors: Cheng, Jun, Ren, Ziliang, Zhang, Qieshi, Gao, Xiangyang, Hao, Fusheng
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:RGB-D-based human action recognition has attracted much attention recently because it can provide more complementary information than a single modality. However, it is difficult for two modalities to effectively learn spatial-temporal information from each other. To facilitate information interaction between different modalities, a cross-modality compensation convolutional neural network (ConvNet) is proposed for human action recognition, which enhances the discriminative ability by jointly learning compensation features from the RGB and depth modalities. Moreover, we design a cross-modality compensation block (CMCB) to extract compensation features from the RGB and depth modalities. Specifically, CMCB is incorporated into two typical network architectures, ResNet and VGG, to verify the ability to improve the performance of our model. The proposed architecture has been evaluated on three challenging datasets: NTU RGB+D 120, THU-READ and PKU-MMD. We experimentally verify that our proposed model with CMCB is effective for different input types, such as pairs of raw images and dynamic images constructed from the entire RGB-D sequence, and the experimental results show that the proposed framework achieves state-of-the-art performance on all three datasets.
ISSN:1051-8215
1558-2205
DOI:10.1109/TCSVT.2021.3076165