Loading…
A novel multi-step Q-learning method to improve data efficiency for deep reinforcement learning
Deep reinforcement learning (DRL) algorithms with experience replays have been used to solve many sequential learning problems. However, in practice, DRL algorithms still suffer from the data inefficiency problem, which limits their applicability in many scenarios, and renders them inefficient in so...
Saved in:
Published in: | Knowledge-based systems 2019-07, Vol.175, p.107-117 |
---|---|
Main Authors: | , , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Deep reinforcement learning (DRL) algorithms with experience replays have been used to solve many sequential learning problems. However, in practice, DRL algorithms still suffer from the data inefficiency problem, which limits their applicability in many scenarios, and renders them inefficient in solving real-world problems. To improve the data efficiency of DRL, in this paper, a new multi-step method is proposed. Unlike traditional algorithms, the proposed method uses a new return function, which alters the discount of future rewards while decreasing the impact of the immediate reward when selecting the current state action. This approach has the potential to improve the efficiency of reward data. By combining the proposed method with classic DRL algorithms, deep Q-networks (DQN) and double deep Q-networks (DDQN), two novel algorithms are proposed for improving the efficiency of learning from experience replay. The performance of the proposed algorithms, expected n-step DQN (EnDQN) and expected n-step DDQN (EnDDQN), are validated using two simulation environments, CartPole and DeepTraffic. The experimental results demonstrate that the proposed multi-step methods greatly improve the data efficiency of DRL agents while further improving the performance of existing classic DRL algorithms when incorporated into their training.
•A novel multi-step Q-learning method is proposed to improve data efficiency for DRL.•The proposed multi-step Q-learning method is derived by adopting a new return function.•The new return function alters the discount of future rewards and loosens the impact of the immediate reward.•Experimental-results shows the proposed methods can improve the data efficiency of DRL agents. |
---|---|
ISSN: | 0950-7051 1872-7409 |
DOI: | 10.1016/j.knosys.2019.03.018 |