Loading…

Reinforcement Learning Compensator Robust to the Time Constants of First Order Delay Elements

Reinforcement learning is a learning paradigm in which a control is learned automatically based on rewards through trial and error based on rewards. When reinforcement learning is employed for robot control, the action that is output by reinforcement learning and the input of the actuator are often...

Full description

Saved in:
Bibliographic Details
Main Authors: Kobayashi, Shoki, Shibuya, Takeshi
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Reinforcement learning is a learning paradigm in which a control is learned automatically based on rewards through trial and error based on rewards. When reinforcement learning is employed for robot control, the action that is output by reinforcement learning and the input of the actuator are often the same. A robot's actuator has a time constant of a first-order delay element between input and output. Delays result in the deterioration of the reinforcement learning performance because the environments that contain them lack the Markov property. Although there have been studies of such environments, they are problematic in that performance deteriorates when the time constant of a first-order time-delay element greater than the control cycle. The principal contribution of this paper is to propose a compensator for reinforcement learning that is more effective than conventional methods for environments with a time constant of a first-order time-delay element greater than the control cycle. The purpose of the compensator is to minimize the difference between actions in delayed environments and those not in delayed environments. Experiments reveal that the compensator increases rewards within wider ranges than conventional methods.
ISSN:2577-1655
DOI:10.1109/SMC42975.2020.9283188