Loading…
A Novel Multi-Objective Deep Q-Network: Addressing Immediate and Delayed Rewards in Multi-Objective Q-Learning
Current multi-objective reinforcement learning (MORL) research often struggles to balance multiple objectives and manage the stability and performance of learning algorithms, especially in complex environments. To address this, we propose a new multi-objective deep Q-network (MO-DQN) framework that...
Saved in:
Published in: | IEEE access 2024, Vol.12, p.144932-144949 |
---|---|
Main Author: | |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Current multi-objective reinforcement learning (MORL) research often struggles to balance multiple objectives and manage the stability and performance of learning algorithms, especially in complex environments. To address this, we propose a new multi-objective deep Q-network (MO-DQN) framework that integrates linear scalarization in MORL. This framework has the following features: First, we provide immediate feedback by incorporating linear scalarization into reward processing. Compared with some complex multi-objective optimization methods, this approach is relatively easy to understand and implement, offering greater convenience for practical applications. Additionally, linear scalarization accelerates the learning process and enhances the algorithm's ability to dynamically adjust strategies. Second, we develop the Linear Scalarized Multi-objective Deep Q-Network (LSMO-DQN) under different reward mechanisms, improving MO-DQN's ability to balance multiple objectives effectively. Immediate reward strategies accelerate learning and enable rapid adjustments, benefiting dynamic environments. In contrast, delayed reward strategies help understand long-term action impacts and promote strategic decision-making. Experiments are conducted in two different multi-objective environments, where our proposed method slightly outperforms other techniques, indicating its robustness and adaptability. Specifically, LSMO-DQN achieves higher cumulative rewards and demonstrates improved stability across various reward structures. The findings suggest that integrating linear scalarization in reward processing not only enhances learning performance but also provides a more straightforward approach to managing the trade-offs in multi-objective settings. These results highlight the potential of LSMO-DQN to improve learning performance, particularly in scenarios with data imbalances. |
---|---|
ISSN: | 2169-3536 2169-3536 |
DOI: | 10.1109/ACCESS.2024.3465628 |