Loading…
DEMRL: Dynamic estimation meta reinforcement learning for path following on unseen unmanned surface vehicle
Reinforcement learning has been widely used for unmanned surface vehicle (USV) control tasks. However, the requirement of numerous training samples limits its transferability to new USVs. In this article, we propose a dynamic estimation meta reinforcement learning (DEMRL) approach that enables few-s...
Saved in:
Published in: | Ocean engineering 2023-11, Vol.288, p.115958, Article 115958 |
---|---|
Main Authors: | , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Reinforcement learning has been widely used for unmanned surface vehicle (USV) control tasks. However, the requirement of numerous training samples limits its transferability to new USVs. In this article, we propose a dynamic estimation meta reinforcement learning (DEMRL) approach that enables few-shot learning for the path following control policy. We first present a dynamic estimation method to learn a latent dynamic context feature. The learned context contains the hidden information of USV dynamics with only a few estimation samples. We then propose a meta reinforcement learning based training framework to learn the generalizable path following control policy. After that, given the prior knowledge from dynamic context, the well-trained policy can easily adapt to the target USV during the rapid adaptation process. This proposed method represents the initial effort in tackling the few-shot learning challenge associated with training reinforcement learning based USV path-following policies. Extensive experiments demonstrate that the proposed method can achieve promising path following performance for unseen USV with very few training data and training volume.
•To the best of our knowledge, this is the first study for addressing the few-shot learning problem of the RL-based USV path following policy training.•The novel estimation dynamic is able to learn a dynamic context that implicitly depict the dynamic characteristics of the target USV, with the samples collected by the proposed Z-policy. |
---|---|
ISSN: | 0029-8018 1873-5258 |
DOI: | 10.1016/j.oceaneng.2023.115958 |