Loading…
Improvement of MADRL Equilibrium Based on Pareto Optimization
Abstract In order to solve the incalculability caused by the issue of inconsistent objective functions in multi-agent deep reinforcement learning, the concept of Nash equilibrium is introduced. However, a Marko game may have multiple equilibriums, how to filter out a stable and optimal one is worth...
Saved in:
Published in: | Computer journal 2023-07, Vol.66 (7), p.1573-1585 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Abstract
In order to solve the incalculability caused by the issue of inconsistent objective functions in multi-agent deep reinforcement learning, the concept of Nash equilibrium is introduced. However, a Marko game may have multiple equilibriums, how to filter out a stable and optimal one is worth studying. Besides solution concept, how to keep the balance between exploration and exploitation is another key issue in reinforcement learning. On basis of the methods, which can converge to Nash equilibrium, this paper makes improvement through Pareto optimization. In order to alleviate the problem of over fitting caused by Pareto optimization and non-convergence caused by strategy change, we use stratified sampling in place of random sampling as assistance. What’s more, our methods are trained through fictitious self-play to make full of self-learning experiences. By analyzing the experiment carried out on MAgent platform, the proposed methods are not only far better than traditional methods, but also reaching or even surpassing the state of art MADRL methods. |
---|---|
ISSN: | 0010-4620 1460-2067 |
DOI: | 10.1093/comjnl/bxac027 |