Loading…

Imitation Learning via Simultaneous Optimization of Policies and Auxiliary Trajectories

Imitation learning (IL) is a frequently used approach for data-efficient policy learning. Many IL methods, such as Dataset Aggregation (DAgger), combat challenges like distributional shift by interacting with oracular experts. Unfortunately, assuming access to oracular experts is often unrealistic i...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2021-06
Main Authors:	Xie, Mandy, Li, Anqi, Karl Van Wyk, Dellaert, Frank, Boots, Byron, Ratliff, Nathan
Format:	Article
Language:	English
Subjects:	Cloning Collocation Degrees of freedom Learning Optimal control Optimization Robot arms
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Imitation learning (IL) is a frequently used approach for data-efficient policy learning. Many IL methods, such as Dataset Aggregation (DAgger), combat challenges like distributional shift by interacting with oracular experts. Unfortunately, assuming access to oracular experts is often unrealistic in practice; data used in IL frequently comes from offline processes such as lead-through or teleoperation. In this paper, we present a novel imitation learning technique called Collocation for Demonstration Encoding (CoDE) that operates on only a fixed set of trajectory demonstrations. We circumvent challenges with methods like back-propagation-through-time by introducing an auxiliary trajectory network, which takes inspiration from collocation techniques in optimal control. Our method generalizes well and more accurately reproduces the demonstrated behavior with fewer guiding trajectories when compared to standard behavioral cloning methods. We present simulation results on a 7-degree-of-freedom (DoF) robotic manipulator that learns to exhibit lifting, target-reaching, and obstacle avoidance behaviors.
ISSN:	2331-8422