Loading…
Cooperative control of velocity and heading for unmanned surface vessel based on twin delayed deep deterministic policy gradient with an integral compensator
This paper addresses cooperative control of velocity and heading for an unmanned surface vessel (USV) utilizing a twin delay deep deterministic policy gradient (TD3) reinforcement learning algorithm. The utilization of a deep neural network establishes a direct correlation between the USV’s state pa...
Saved in:
Published in: | Ocean engineering 2023-11, Vol.288, p.115943, Article 115943 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | This paper addresses cooperative control of velocity and heading for an unmanned surface vessel (USV) utilizing a twin delay deep deterministic policy gradient (TD3) reinforcement learning algorithm. The utilization of a deep neural network establishes a direct correlation between the USV’s state parameters and motor control quantities. A reward function is devised to update the network parameters and which acquires the trained model. The introducing of an integral compensator effectively eliminates the steady-state error of the system, thereby significantly enhancing the precision of both velocity control and heading control. Furthermore, a two-stage training algorithm comprising offline learning and online learning has been devised. Through offline learning, a deep neural network model for the USV controller is obtained. Subsequently, the optimization of the controller strategy is conducted during the online learning phase. Ultimately, the simulation results demonstrate the exceptional control performance attained by the proposed algorithm.
•A twin delayed deep deterministic policy gradient algorithm with integral compensation (TD3-IC) is proposed.•A two-stage training algorithm is used to first train offline in a simulated environment, and then train online to optimize the control strategy.•The performance of TD3-IC controller is compared with other controllers in the experiment.•The generalization and anti-interference experiment of the model is carried out. |
---|---|
ISSN: | 0029-8018 1873-5258 |
DOI: | 10.1016/j.oceaneng.2023.115943 |