Loading…

Model-Based Actor-Critic Learning for Optimal Tracking Control of Robots With Input Saturation

As robots normally perform repetitive work, reinforcement learning (RL) appears to be a promising tool for designing robot control. However, the learning cycle of control strategy tends to be long, thereby limiting the applications of RL for a real robotic system. This article proposes model-based a...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on industrial electronics (1982) 2021-06, Vol.68 (6), p.5046-5056
Main Authors:	Zhao, Xingwei, Tao, Bo, Qian, Lu, Ding, Han
Format:	Article
Language:	English
Subjects:	Actor-critic Algorithms Artificial neural networks Control theory Feedforward systems linear quadratic tracker (LQT) Machine learning Mathematical model Multiple robots Neural networks reinforcement learning (RL) robot Robot control Robots Saturation Service robots Task analysis Tracking control Trajectory
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	As robots normally perform repetitive work, reinforcement learning (RL) appears to be a promising tool for designing robot control. However, the learning cycle of control strategy tends to be long, thereby limiting the applications of RL for a real robotic system. This article proposes model-based actor-critic learning for optimal tracking control of robotic systems to address this limitation. A preconstructed critic is defined in the framework of linear quadratic tracker, and a model-based actor update law is presented on the basis of deterministic policy gradient algorithm to improve learning efficiency. A low gain parameter is introduced in the critic to avoid input saturation. Compared with neural network-based RL, the proposed method including the preconstructed critic and actor, has rapid, steady, and reliable learning process, which is friendly for the physical hardware. The performance and effectiveness of the proposed method are validated using a dual-robot test rig. The experimental results show that the proposed learning algorithm can train multiple robots to learn their optimal tracking control laws within a training time of 200 s.
ISSN:	0278-0046 1557-9948
DOI:	10.1109/TIE.2020.2992003