Loading…

Demonstrating Multi-modal Human Instruction Comprehension with AR Smart Glass

We present a multi-modal human instruction comprehension prototype for object acquisition tasks that involve verbal, visual and pointing gesture cues. Our prototype includes an AR smart-glass for issuing the instructions and a Jetson TX2 pervasive device for executing comprehension algorithms. With...

Full description

Saved in:

Bibliographic Details
Main Authors:	Weerakoon, Dulanga, Subbaraju, Vigneshwaran, Tran, Tuan, Misra, Archan
Format:	Conference Proceeding
Language:	English
Subjects:	Computational efficiency Computational modeling Human-AI Collaboration Multi-Modal Networks Pervasive Systems Predictive models Prototypes Real-time systems Referring Expression Comprehension Task analysis Visual Grounding Visualization
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	We present a multi-modal human instruction comprehension prototype for object acquisition tasks that involve verbal, visual and pointing gesture cues. Our prototype includes an AR smart-glass for issuing the instructions and a Jetson TX2 pervasive device for executing comprehension algorithms. With this setup, we enable on-device, computationally efficient object acquisition task comprehension with an average latency in the range of 150-330msec.
ISSN:	2155-2509
DOI:	10.1109/COMSNETS56262.2023.10041269