Loading…

Centralized reinforcement learning for multi-agent cooperative environments

We study reinforcement learning methods in multi-agent domains where a central controller collects all information and decides an action for every agent. However, multi-agent reinforcement learning (MARL) suffers from the combinatorial explosion of action space. In this work, we propose an improved...

Full description

Saved in:
Bibliographic Details
Published in:Evolutionary intelligence 2024-02, Vol.17 (1), p.267-273
Main Authors: Lu, Chengxuan, Bao, Qihao, Xia, Shaojie, Qu, Chongxiao
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We study reinforcement learning methods in multi-agent domains where a central controller collects all information and decides an action for every agent. However, multi-agent reinforcement learning (MARL) suffers from the combinatorial explosion of action space. In this work, we propose an improved proximal policy optimization (PPO) algorithm, whose neural network is based on attention mechanism, to solve the combinatorial explosion issue. Our model outputs joint-action instead of distributed action. Parameter sharing of attention mechanism enables the size of neural network linearly with local observation’s length of single agent regardless of the agents’ number. Besides, credit assignment of multi-agent is naturally addressed by gradient ascent in the attention layer. Experiment results demonstrate that our method outperforms independent PPO and centralized PPO with other networks.
ISSN:1864-5909
1864-5917
DOI:10.1007/s12065-022-00703-4