Loading…

Multi-agent Proximal Policy Optimization via Non-fixed Value Clipping

With the wide application of multi-intelligent reinforcement learning (MARL), its development becomes more and more mature. Multi-agent Proximal Policy Optimization (MAPPO) extended by Proximal Policy Optimization (PPO) algorithm has attracted the attention of researchers with its superior performan...

Full description

Saved in:

Bibliographic Details
Main Authors:	Liu, Chiqiang, Li, Dazi
Format:	Conference Proceeding
Language:	English
Subjects:	Control systems Gaussian noise Learning systems Multi-agent reinforcement learning Noisy value function Non-fixed Value Clipping Optimization Proximal Policy Optimization Reinforcement learning Task analysis
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	With the wide application of multi-intelligent reinforcement learning (MARL), its development becomes more and more mature. Multi-agent Proximal Policy Optimization (MAPPO) extended by Proximal Policy Optimization (PPO) algorithm has attracted the attention of researchers with its superior performance. However, the increase in the number of agents in multi-agent cooperation tasks leads to overfitting problems and suboptimal policies due to the fixed clip range that limits the step size of updates. In this paper, MAPPO via Non-fixed Value Clipping (NVC-MAPPO) algorithm is proposed based on MAPPO, and Gaussian noise is introduced in the value function and the clipping function, respectively, and rewriting the clipping function into a form called non-fixed value clipping function. In the end, experiments are conducted on StarCraftII Multi-Agent Challenge (SMAC) to verify that the algorithm can effectively prevent the step size from changing too much while enhancing the exploration ability of the agents, which has improved the performance compared with MAPPO.
ISSN:	2767-9861
DOI:	10.1109/DDCLS58216.2023.10167264