Loading…

UAV path planning based on the average TD3 algorithm with prioritized experience replay

Path planning is one of the important components of the Unmanned Aerial Vehicle (UAV) mission, and it is also the key guarantee for the successful completion of the UAV's mission. The traditional path planning algorithm has certain limitations and deficiencies in the complex dynamic environment...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE access 2024-01, Vol.12, p.1-1
Main Authors:	Luo, Xuqiong, Wang, Qiyuan, Gong, Hongfang, Tang, Chao
Format:	Article
Language:	English
Subjects:	Algorithms Approximation algorithms Autonomous aerial vehicles Average TD3 Algorithm Barriers Collision rates Deep learning Deep Reinforcement Learning Heuristic algorithms Machine learning algorithms Path planning Prioritized Experience Replay Stability analysis Training UAV Unmanned aerial vehicles
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Path planning is one of the important components of the Unmanned Aerial Vehicle (UAV) mission, and it is also the key guarantee for the successful completion of the UAV's mission. The traditional path planning algorithm has certain limitations and deficiencies in the complex dynamic environment. Aiming at the dynamic complex obstacle environment, this paper proposes an improved TD3 algorithm, which enables the UAV to complete the autonomous path planning through online learning and continuous trial and error. The algorithm changes the experience pool of TD3 algorithm to priority experience replay, so that the agent can distinguish the importance of empirical samples, improve the sampling efficiency of the algorithm, and reduce the training time. The average TD3 is proposed, and the average value of Q 1 Q 2 is taken when the target value is updated to solve the problem of overestimating the Q value while avoiding underestimating the Q value, so that the improved algorithm has better stability and can adapt to various complex obstacle environments. A new reward function is set up, so that each step of the UAV action can receive reward feedback, which solves the problem of sparse reward in deep reinforcement learning. The experimental results show that this method can train the UAV to reach the target safely and quickly in a multi-obstacle environment. Compared with DDPG, SAC and traditional TD3, the path planning success rate of this algorithm is higher than that of the other three algorithms, and the collision rate is lower than that of the comparison algorithm, which has better path planning performance.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2024.3375083