Loading…

Generative adversarial interactive imitation learning for path following of autonomous underwater vehicle

Autonomous underwater vehicle (AUV) is playing a more and more important role in marine scientific research and resource exploration due to its flexibility. Recently, deep reinforcement learning (DRL) has been used to improve the autonomy of AUV. However, it is very time-consuming and even unpractic...

Full description

Saved in:
Bibliographic Details
Published in:Ocean engineering 2022-09, Vol.260, p.111971, Article 111971
Main Authors: Jiang, Dong, Huang, Jie, Fang, Zheng, Cheng, Chunxi, Sha, Qixin, He, Bo, Li, Guangliang
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Autonomous underwater vehicle (AUV) is playing a more and more important role in marine scientific research and resource exploration due to its flexibility. Recently, deep reinforcement learning (DRL) has been used to improve the autonomy of AUV. However, it is very time-consuming and even unpractical to define efficient reward functions for DRL to learn control policies in various tasks. In this paper, we implemented the generative adversarial imitation learning (GAIL) algorithm learning from demonstrated trajectories and proposed GA2IL learning from demonstrations and additional human rewards for AUV path following. We evaluated GAIL and our GA2IL method in a straight line following task and a sinusoids curve following task on the Gazebo platform extended to simulated underwater environments with AUV simulator of our lab. Both methods were compared to PPO—a classic traditional deep reinforcement learning from a predefined reward function, and a well-tuned PID controller. In addition, to evaluate the generalization of GAIL and our GA2IL method, we tested the trained control policies of the previous two tasks via GAIL and GA2IL in a new complex comb scan following task and a different sinusoids curve following task respectively. Our simulation results show AUV path following with GA2IL and GAIL can obtain a performance at a similar level to PPO and PID controller in both tasks. Moreover, GA2IL can generalize as well as PPO, adapting better to complex and different tasks than traditional PID controller. •Implemented GAIL learning from demonstrations for AUV path following.•Proposed GA2IL allowing AUV to learn a control policy from human rewards and demonstrations.•GAIL and GA2IL can achieve a similar performance to PPO and PID controller.•GA2IL generalizes as well as PPO, adapting better to complex tasks than PID controller.
ISSN:0029-8018
1873-5258
DOI:10.1016/j.oceaneng.2022.111971