Loading…

Interactive Visual Task Learning for Robots

We present a demonstrable framework for robots to learn novel visual concepts and visual tasks via in-situ linguistic interactions with human users. Previous approaches in computer vision have either used large pre-trained visual models to infer novel objects zero-shot, or added novel concepts along...

Full description

Saved in:
Bibliographic Details
Main Authors: Gu, Weiwei, Sah, Anant, Gopalan, Nakul
Format: Conference Proceeding
Language:English
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We present a demonstrable framework for robots to learn novel visual concepts and visual tasks via in-situ linguistic interactions with human users. Previous approaches in computer vision have either used large pre-trained visual models to infer novel objects zero-shot, or added novel concepts along with their attributes and representations to a concept hierarchy. We extend the approaches that focus on learning visual concept hierarchies and take this ability one step further to demonstrate novel task solving on robots along with the learned visual concepts. To enable a visual concept learner to solve robotics tasks one-shot, we developed two distinct techniques. Firstly, we propose a novel approach, Hi-Viscont(HIerarchical VISual CONcept learner for Task), which augments information of a novel concept, that is being taught, to its parent nodes within a concept hierarchy. This information propagation allows all concepts in a hierarchy to update as novel concepts are taught in a continual learning setting. Secondly, we represent a visual task as a scene graph with language annotations, allowing us to create novel permutations of a demonstrated task zero-shot in-situ. Combining the two techniques, we present a demonstration on a real robot that learns visual task and concepts in one-shot from in-situ interactions with human users, and generalize to perform a novel visual task of the same type in zero-shot. As shown by the studies in the main conference paper, our system achieves a success rate of 50% on solving the whole task correctly with generalization where the baseline performs at 14% without any ability to generalize to novel tasks and concepts. We will demonstrate our working interactive learning pipeline at AAAI 2024 in person with our robot and other required hardware.
ISSN:2159-5399
2374-3468
DOI:10.1609/aaai.v38i21.30567