Loading…
Potential-based reward shaping using state–space segmentation for efficiency in reinforcement learning
Reinforcement Learning (RL) algorithms encounter slow learning in environments with sparse explicit reward structures due to the limited feedback available on the agent’s behavior. This problem is exacerbated particularly in complex tasks with large state and action spaces. To address this inefficie...
Saved in:
Published in: | Future generation computer systems 2024-08, Vol.157, p.469-484 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Reinforcement Learning (RL) algorithms encounter slow learning in environments with sparse explicit reward structures due to the limited feedback available on the agent’s behavior. This problem is exacerbated particularly in complex tasks with large state and action spaces. To address this inefficiency, in this paper, we propose a novel approach based on potential-based reward-shaping using state–space segmentation to decompose the task and to provide more frequent feedback to the agent. Our approach involves extracting state–space segments by formulating the problem as a minimum cut problem on a transition graph, constructed using the agent’s experiences during interactions with the environment via the Extended Segmented Q-Cut algorithm. Subsequently, these segments are leveraged in the agent’s learning process through potential-based reward shaping. Our experimentation on benchmark problem domains with sparse rewards demonstrated that our proposed method effectively accelerates the agent’s learning without compromising computation time while upholding the policy invariance principle.
•An online and proper segmentation with Extended Segmented Q-Cut approach on state space of the given RL problem leads a decomposition of the task for the learning agent.•Applying reward shaping based on this segmentation compensate sparse rewards of the environment with shaped rewards as immediate feedback.•Having or being closer to the goal state for a segment naturally designates the potential of the state in the resulting policy.•Guiding the learning agent toward such a segment, without violating policy invariance property, facilitates the early convergence to an optimal solution. |
---|---|
ISSN: | 0167-739X 1872-7115 |
DOI: | 10.1016/j.future.2024.03.057 |