Loading…
Coordinated learning in multiagent MDPs with infinite state-space
In this paper we address the problem of simultaneous learning and coordination in multiagent Markov decision problems (MMDPs) with infinite state-spaces. We separate this problem in two distinct subproblems: learning and coordination. To tackle the problem of learning, we survey Q -learning with sof...
Saved in:
Published in: | Autonomous agents and multi-agent systems 2010-11, Vol.21 (3), p.321-367 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In this paper we address the problem of simultaneous learning and coordination in multiagent Markov decision problems (MMDPs) with infinite state-spaces. We separate this problem in two distinct subproblems: learning and coordination. To tackle the problem of learning, we survey
Q
-learning with soft-state aggregation (
Q
-SSA), a well-known method from the reinforcement learning literature (Singh et al. in Advances in neural information processing systems. MIT Press, Cambridge, vol 7, pp 361–368, 1994).
Q
-SSA allows the agents in the game to approximate the optimal
Q
-function, from which the optimal policies can be computed. We establish the convergence of
Q
-SSA and introduce a new result describing the rate of convergence of this method. In tackling the problem of coordination, we start by pointing out that the knowledge of the optimal
Q
-function is not enough to ensure that all agents adopt a jointly optimal policy. We propose a novel coordination mechanism that, given the knowledge of the optimal
Q
-function for an MMDP, ensures that all agents converge to a jointly optimal policy in every relevant state of the game. This coordination mechanism, approximate biased adaptive play (ABAP), extends biased adaptive play (Wang and Sandholm in Advances in neural information processing systems. MIT Press, Cambridge, vol 15, pp 1571–1578, 2003) to MMDPs with infinite state-spaces. Finally, we combine
Q
-SSA with ABAP, this leading to a novel algorithm in which learning of the game and coordination take place simultaneously. We discuss several important properties of this new algorithm and establish its convergence with probability 1. We also provide simple illustrative examples of application. |
---|---|
ISSN: | 1387-2532 1573-7454 |
DOI: | 10.1007/s10458-009-9104-y |