Loading…

Inverse reinforcement learning for decentralized non-cooperative multiagent systems

The objective of inverse reinforcement learning (IRL) is to learn an agent's reward function based on either the agent's policies or the observations of the policy. In this paper we address the issue of using inverse reinforcement learning to learn the reward function in a multi agent sett...

Full description

Saved in:

Bibliographic Details
Main Authors:	Reddy, T. S., Gopikrishna, V., Zaruba, G., Huber, M.
Format:	Conference Proceeding
Language:	English
Subjects:	Game Theory Games General-Sum Stochastic Games Inverse Reinfocement Learning Joints Markov processes Multiagent systems Nash equilibrium Trajectory
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The objective of inverse reinforcement learning (IRL) is to learn an agent's reward function based on either the agent's policies or the observations of the policy. In this paper we address the issue of using inverse reinforcement learning to learn the reward function in a multi agent setting, where the agents can either cooperate or be strictly non-cooperative. The case of cooperataing agents is a subcase of the non-cooperative setting, where the agents collectively try to maximize a common reward function, instead of maximizing their individual reward functions. Here we present an IRL algorithm that considers the case where the policies of the agents are known. We use the framework that was described by Ng and Russell [2001] and extend it for a Multiagent setting. We assume that the agents are rational and follow an optimal policy in the sense of the Nash Equilibrium. These assumptions are very common in Multiagent systems. We show that in the case of known policies we can reduce the Multiagent problem to a distributed solution where the reward function for each agent can be solved independently using a very similar formulation as for the single agent case.
ISSN:	1062-922X 2577-1655
DOI:	10.1109/ICSMC.2012.6378020