Loading…

Human-object interaction detection with depth-augmented clues

Human object interaction (HOI) detection aims to localize and classify triplets of human, object and relationship from a given image. Different from previous methods that only extract vision information in RGB images, we propose a Depth-augmented Relationship Reasoning (DRR) method that focuses on t...

Full description

Saved in:
Bibliographic Details
Published in:Neurocomputing (Amsterdam) 2022-08, Vol.500, p.978-988
Main Authors: Cheng, Yamin, Duan, Hancong, Wang, Chen, Wang, Zhi
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Human object interaction (HOI) detection aims to localize and classify triplets of human, object and relationship from a given image. Different from previous methods that only extract vision information in RGB images, we propose a Depth-augmented Relationship Reasoning (DRR) method that focuses on the RGB images and corresponding depth messages simultaneously. Rethinking principles of photography, we argue that RGB images discard spatial depth carrying third dimension relative distance information between instances. In light of this, we beforehand estimate the depth information for each image, yielding a corresponding depth map. Then we leverage multiple representations encoded by depth information and RGB images to enrich semantic interpretation. Subsequently, we explore a hierarchical attention strategy to fuse these semantic representations and further generate depth-augmented features, being used to reason about fine-grained human-object interactions. Extensive experiments on the benchmark datasets V-COCO, HICO-DET and HCVRD verify the effectiveness of our method and demonstrate the importance of spatial depth information for HOI.
ISSN:0925-2312
1872-8286
DOI:10.1016/j.neucom.2022.05.014