Loading…

Learning Contraction Policies From Offline Data

This letter proposes a data-driven method for learning convergent control policies from offline data using Contraction theory. Contraction theory enables constructing a policy that makes the closed-loop system trajectories inherently convergent towards a unique trajectory. At the technical level, id...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE robotics and automation letters 2022-04, Vol.7 (2), p.2905-2912
Main Authors:	Rezazadeh, Navid, Kolarich, Maxwell, Kia, Solmaz S., Mehr, Negar
Format:	Article
Language:	English
Subjects:	Algorithms Convergence Datasets Deep learning methods Feedback control Heuristic algorithms Learning Machine learning machine learning for robot control Measurement Policies reinforcement learning Robots Robustness System dynamics Trajectory
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This letter proposes a data-driven method for learning convergent control policies from offline data using Contraction theory. Contraction theory enables constructing a policy that makes the closed-loop system trajectories inherently convergent towards a unique trajectory. At the technical level, identifying the contraction metric, which is the distance metric with respect to which a robot's trajectories exhibit contraction is often non-trivial. We propose to jointly learn the control policy and its corresponding contraction metric while enforcing contraction. To achieve this, we learn an implicit dynamics model of the robotic system from an offline data set consisting of the robot's state and input trajectories. We propose a data augmentation algorithm for learning contraction policies using this learned dynamics model. We randomly generate samples in the state space and propagate them forward in time through the learned dynamics model to generate auxiliary sample trajectories. We then learn both the control policy and the contraction metric such that the distance between the trajectories from the offline data set and our generated auxiliary sample trajectories decreases over time. We evaluate the performance of our proposed framework on simulated robotic goal-reaching tasks and demonstrate that enforcing contraction results in faster convergence and greater robustness of the learned policy.
ISSN:	2377-3766 2377-3766
DOI:	10.1109/LRA.2022.3145100