Loading…

Unmanned Aerial Vehicle Path Planning Algorithm Based on Deep Reinforcement Learning in Large-Scale and Dynamic Environments

Path planning is one of the key technologies for autonomous flight of Unmanned Aerial Vehicle. Traditional path planning algorithms have some limitations and deficiencies in the complex and dynamic environment. In this article, we propose a deep reinforcement learning approach for three-dimensional...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE access 2021, Vol.9, p.24884-24900
Main Authors:	Xie, Ronglei, Meng, Zhijun, Wang, Lifeng, Li, Haochen, Wang, Kaipeng, Wu, Zhe
Format:	Article
Language:	English
Subjects:	Algorithms Deep learning Deep reinforcement learning Dimensional stability Heuristic algorithms Machine learning Markov processes Obstacle avoidance Path planning recurrent neural network Recurrent neural networks Reinforcement learning Safety Unknown environments Unmanned aerial vehicles Vehicle dynamics
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Path planning is one of the key technologies for autonomous flight of Unmanned Aerial Vehicle. Traditional path planning algorithms have some limitations and deficiencies in the complex and dynamic environment. In this article, we propose a deep reinforcement learning approach for three-dimensional path planning by utilizing the local information and relative distance without global information. UAV can obtain the limited environmental information nearby in the actual scenario with limited sensor capabilities. Therefore, path planning can be formulated as a Partially Observable Markov Decision Process. The recurrent neural network with temporal memory is constructed to address the partial observability problem by extracting crucial information from historical state-action sequences. We develop an action selection strategy that combines the current reward value and the state-action value to reduce the meaningless exploration. In addition, we construct two sample memory pools and propose an adaptive experience replay mechanism based on the frequency of failure. The simulation experiment results show that our method has significant improvements over Deep Q-Network and Deep Recurrent Q-Network in terms of stability and learning efficiency. Our approach successfully plans a reasonable three-dimensional path in the large-scale and complex environment, and has the perfect ability to avoid obstacles.in the unknown environment.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2021.3057485