Loading…

A novel multi-step Q-learning method to improve data efficiency for deep reinforcement learning

Deep reinforcement learning (DRL) algorithms with experience replays have been used to solve many sequential learning problems. However, in practice, DRL algorithms still suffer from the data inefficiency problem, which limits their applicability in many scenarios, and renders them inefficient in so...

Full description

Saved in:

Bibliographic Details
Published in:	Knowledge-based systems 2019-07, Vol.175, p.107-117
Main Authors:	Yuan, Yinlong, Yu, Zhu Liang, Gu, Zhenghui, Yeboah, Yao, Wei, Wu, Deng, Xiaoyan, Li, Jingcong, Li, Yuanqing
Format:	Article
Language:	English
Subjects:	Algorithms Computer simulation Data efficiency Deep reinforcement learning Efficiency Machine learning Multi-step methods Robotics
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Deep reinforcement learning (DRL) algorithms with experience replays have been used to solve many sequential learning problems. However, in practice, DRL algorithms still suffer from the data inefficiency problem, which limits their applicability in many scenarios, and renders them inefficient in solving real-world problems. To improve the data efficiency of DRL, in this paper, a new multi-step method is proposed. Unlike traditional algorithms, the proposed method uses a new return function, which alters the discount of future rewards while decreasing the impact of the immediate reward when selecting the current state action. This approach has the potential to improve the efficiency of reward data. By combining the proposed method with classic DRL algorithms, deep Q-networks (DQN) and double deep Q-networks (DDQN), two novel algorithms are proposed for improving the efficiency of learning from experience replay. The performance of the proposed algorithms, expected n-step DQN (EnDQN) and expected n-step DDQN (EnDDQN), are validated using two simulation environments, CartPole and DeepTraffic. The experimental results demonstrate that the proposed multi-step methods greatly improve the data efficiency of DRL agents while further improving the performance of existing classic DRL algorithms when incorporated into their training. •A novel multi-step Q-learning method is proposed to improve data efficiency for DRL.•The proposed multi-step Q-learning method is derived by adopting a new return function.•The new return function alters the discount of future rewards and loosens the impact of the immediate reward.•Experimental-results shows the proposed methods can improve the data efficiency of DRL agents.
ISSN:	0950-7051 1872-7409
DOI:	10.1016/j.knosys.2019.03.018