Loading…

DEMRL: Dynamic estimation meta reinforcement learning for path following on unseen unmanned surface vehicle

Reinforcement learning has been widely used for unmanned surface vehicle (USV) control tasks. However, the requirement of numerous training samples limits its transferability to new USVs. In this article, we propose a dynamic estimation meta reinforcement learning (DEMRL) approach that enables few-s...

Full description

Saved in:

Bibliographic Details
Published in:	Ocean engineering 2023-11, Vol.288, p.115958, Article 115958
Main Authors:	Jin, Kefan, Zhu, Hao, Gao, Rui, Wang, Jian, Wang, Hongdong, Yi, Hong, Richard Shi, C.-J.
Format:	Article
Language:	English
Subjects:	Meta reinforcement learning Path following control Policy learning Unmanned surface vehicle
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Reinforcement learning has been widely used for unmanned surface vehicle (USV) control tasks. However, the requirement of numerous training samples limits its transferability to new USVs. In this article, we propose a dynamic estimation meta reinforcement learning (DEMRL) approach that enables few-shot learning for the path following control policy. We first present a dynamic estimation method to learn a latent dynamic context feature. The learned context contains the hidden information of USV dynamics with only a few estimation samples. We then propose a meta reinforcement learning based training framework to learn the generalizable path following control policy. After that, given the prior knowledge from dynamic context, the well-trained policy can easily adapt to the target USV during the rapid adaptation process. This proposed method represents the initial effort in tackling the few-shot learning challenge associated with training reinforcement learning based USV path-following policies. Extensive experiments demonstrate that the proposed method can achieve promising path following performance for unseen USV with very few training data and training volume. •To the best of our knowledge, this is the first study for addressing the few-shot learning problem of the RL-based USV path following policy training.•The novel estimation dynamic is able to learn a dynamic context that implicitly depict the dynamic characteristics of the target USV, with the samples collected by the proposed Z-policy.
ISSN:	0029-8018 1873-5258
DOI:	10.1016/j.oceaneng.2023.115958