Loading…

Policy Mirror Ascent for Efficient and Independent Learning in Mean Field Games

Mean-field games have been used as a theoretical tool to obtain an approximate Nash equilibrium for symmetric and anonymous \(N\)-player games. However, limiting applicability, existing theoretical results assume variations of a "population generative model", which allows arbitrary modific...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2023-06
Main Authors:	Yardim, Batuhan, Cayci, Semih, Geist, Matthieu, He, Niao
Format:	Article
Language:	English
Subjects:	Algorithms Ascent Equilibrium Game theory Games Machine learning Population distribution
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Mean-field games have been used as a theoretical tool to obtain an approximate Nash equilibrium for symmetric and anonymous \(N\)-player games. However, limiting applicability, existing theoretical results assume variations of a "population generative model", which allows arbitrary modifications of the population distribution by the learning algorithm. Moreover, learning algorithms typically work on abstract simulators with population instead of the \(N\)-player game. Instead, we show that \(N\) agents running policy mirror ascent converge to the Nash equilibrium of the regularized game within \(\widetilde{\mathcal{O}}(\varepsilon^{-2})\) samples from a single sample trajectory without a population generative model, up to a standard \(\mathcal{O}(\frac{1}{\sqrt{N}})\) error due to the mean field. Taking a divergent approach from the literature, instead of working with the best-response map we first show that a policy mirror ascent map can be used to construct a contractive operator having the Nash equilibrium as its fixed point. We analyze single-path TD learning for \(N\)-agent games, proving sample complexity guarantees by only using a sample path from the \(N\)-agent simulator without a population generative model. Furthermore, we demonstrate that our methodology allows for independent learning by \(N\) agents with finite sample guarantees.
ISSN:	2331-8422