Loading…
Multi-agent Proximal Policy Optimization via Non-fixed Value Clipping
With the wide application of multi-intelligent reinforcement learning (MARL), its development becomes more and more mature. Multi-agent Proximal Policy Optimization (MAPPO) extended by Proximal Policy Optimization (PPO) algorithm has attracted the attention of researchers with its superior performan...
Saved in:
Main Authors: | , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | With the wide application of multi-intelligent reinforcement learning (MARL), its development becomes more and more mature. Multi-agent Proximal Policy Optimization (MAPPO) extended by Proximal Policy Optimization (PPO) algorithm has attracted the attention of researchers with its superior performance. However, the increase in the number of agents in multi-agent cooperation tasks leads to overfitting problems and suboptimal policies due to the fixed clip range that limits the step size of updates. In this paper, MAPPO via Non-fixed Value Clipping (NVC-MAPPO) algorithm is proposed based on MAPPO, and Gaussian noise is introduced in the value function and the clipping function, respectively, and rewriting the clipping function into a form called non-fixed value clipping function. In the end, experiments are conducted on StarCraftII Multi-Agent Challenge (SMAC) to verify that the algorithm can effectively prevent the step size from changing too much while enhancing the exploration ability of the agents, which has improved the performance compared with MAPPO. |
---|---|
ISSN: | 2767-9861 |
DOI: | 10.1109/DDCLS58216.2023.10167264 |