Loading…

Graph convolutional recurrent networks for reward shaping in reinforcement learning

In this paper, we consider the problem of low-speed convergence in Reinforcement Learning (RL). As a solution, various potential-based reward shaping techniques were proposed to form the potential function. Learning a potential function is still challenging and comparable to building a value functio...

Full description

Saved in:

Bibliographic Details
Published in:	Information sciences 2022-08, Vol.608, p.63-80
Main Authors:	Sami, Hani, Bentahar, Jamal, Mourad, Azzam, Otrok, Hadi, Damiani, Ernesto
Format:	Article
Language:	English
Subjects:	Atari Augmented Krylov GCRN Look-Ahead Advice MuJoCo Reinforcement Learning Reward Shaping
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In this paper, we consider the problem of low-speed convergence in Reinforcement Learning (RL). As a solution, various potential-based reward shaping techniques were proposed to form the potential function. Learning a potential function is still challenging and comparable to building a value function from scratch. In this work, our main contribution is proposing a new scheme for reward shaping, which combines (1) the Graph Convolutional Recurrent Networks (GCRN), (2) augmented Krylov, and (3) look-ahead advice to form the potential function. We propose an architecture for GCRN that combines Graph Convolutional Networks (GCN) to capture spatial dependencies and Bi-Directional Gated Recurrent Units (Bi-GRUs) to account for temporal dependencies. Our definition of the loss function of GCRN incorporates the message passing technique of the Hidden Markov Models (HMM). Since the transition matrix of the environment is hard to compute, we use the Krylov basis to estimate the transition matrix, which outperforms the existing approximation bases. Unlike existing potential functions that only rely on states to perform reward shaping, we use both the states and actions through the look-ahead advice mechanism to produce more precise advice. Our evaluations conducted on the Atari 2600 and MuJoCo games show that our solution outperforms the state-of-the-art that utilizes GCN as the potential function in most games in terms of the learning speed while reaching higher rewards.
ISSN:	0020-0255 1872-6291
DOI:	10.1016/j.ins.2022.06.050