Loading…

Multiple Model Q-Learning for Stochastic Asynchronous Rewards

This paper investigates reinforcement learning problems where a stochastic time delay is present in the reinforcement signal, but the delay is unknown to the learning agent. This work posits that the agent may receive individual reinforcements out of order, which is a relaxation of an important assu...

Full description

Saved in:
Bibliographic Details
Published in:Journal of intelligent & robotic systems 2016-03, Vol.81 (3-4), p.407-422
Main Authors: Campbell, Jeffrey S., Givigi, Sidney N., Schwartz, Howard M.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper investigates reinforcement learning problems where a stochastic time delay is present in the reinforcement signal, but the delay is unknown to the learning agent. This work posits that the agent may receive individual reinforcements out of order, which is a relaxation of an important assumption in previous works from the literature. To that end, a stochastic time delay is introduced into a mobile robot line-following application. The main contribution of this work is to provide a novel stochastic approximation algorithm, which is an extension of Q-learning, for the time-delayed reinforcement problem. The paper includes a proof of convergence as well as grid world simulation results from MATLAB, results of line-following simulations within the Cyberbotics Webots mobile robot simulator, and finally, experimental results using an e-Puck mobile robot to follow a real track despite the presence of large, stochastic time delays in its reinforcement signal.
ISSN:0921-0296
1573-0409
DOI:10.1007/s10846-015-0222-2