Loading…

Immune deep reinforcement learning-based path planning for mobile robot in unknown environment

A new deep deterministic policy gradient (DDPG) integrating kinematics analysis and immune optimization (KAI-DDPG) is proposed to address the drawbacks of DDPG in path planning. An orientation angle reward component, linear velocity reward factor, and safety performance reward factor are added to th...

Full description

Saved in:

Bibliographic Details
Published in:	Applied soft computing 2023-09, Vol.145, p.110601, Article 110601
Main Authors:	Yan, Chengliang, Chen, Guangzhu, Li, Yang, Sun, Fuchun, Wu, Yuanyuan
Format:	Article
Language:	English
Subjects:	DDPG Immune algorithm Mobile robot Path planning Reward function
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	A new deep deterministic policy gradient (DDPG) integrating kinematics analysis and immune optimization (KAI-DDPG) is proposed to address the drawbacks of DDPG in path planning. An orientation angle reward component, linear velocity reward factor, and safety performance reward factor are added to the DDPG reward function based on kinematic modeling and analysis of mobile robots. A multi-objective performance index turns the path planning problem into one of multi-objective optimization. We propose KA-DDPG, which uses the orientation angle, linear speed, and safety degree as evaluation indices, and information entropy to alter the influence coefficient of the multi-objective function in the reward function. KAI-DDPG is proposed to address the low learning and training efficiency of KA-DDPG, using immune optimization to optimize the experience samples in the experience buffer pool. Performance indices of traditional path planning and the proposed techniques are compared on a gazebo simulation platform, and the results suggest that KAI-DDPG can mitigate the drawbacks of DDPG, such as a protracted training cycle and poor path planning technique, and can broaden the range of application. •Combined with the kinematics analysis of the mobile robot, orientation angle, linear velocity, and safety performance reward factors are introduced to the DDPG reward function to make the mobile robot plan a smoother trajectory, shorten the navigation distance and time, and improve the navigation success rate.•Information entropy is used to adjust the influence coefficient in the reward function, making its design more reasonable, adapting to simple, complex, and dynamic environments, and enhancing the generalization of the algorithm.•To satisfy the requirements of diverse experiences, the immune algorithm is used to deeply mine the excellent experience in the experience buffer pool, so that the mobile robot can realize a better motion control strategy and improve the utilization rate of experience samples, enabling the reward value to more quickly reach a stable value, while accelerating the convergence of DDPG.
ISSN:	1568-4946 1872-9681
DOI:	10.1016/j.asoc.2023.110601