Loading…

Smooth Q-Learning: An Algorithm for Independent Learners in Stochastic Cooperative Markov Games

In this article, we introduce the Smooth Q-Learning algorithm for independent learners (distributed and non-communicative learners) in cooperative Markov games. Smooth Q-Learning aimed to solve the relative over-generalization and the stochasticity problems while also performing well in the presence...

Full description

Saved in:

Bibliographic Details
Published in:	Journal of intelligent & robotic systems 2023-08, Vol.108 (4), p.65, Article 65
Main Authors:	Amhraoui, Elmehdi, Masrour, Tawfik
Format:	Article
Language:	English
Subjects:	Algorithms Artificial Intelligence Control Electrical Engineering Engineering Games Machine learning Mechanical Engineering Mechatronics Multiagent systems Robotics Short Paper
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In this article, we introduce the Smooth Q-Learning algorithm for independent learners (distributed and non-communicative learners) in cooperative Markov games. Smooth Q-Learning aimed to solve the relative over-generalization and the stochasticity problems while also performing well in the presence of other non-coordination factors such as the miscoordination problem (also known as the Pareto selection problem) and the non-stationarity problem. Smooth Q-Learning is an algorithm that tries to find a trade-off between two incompatible learning approaches: the maximum-based learning and the average-based learning, by dynamically adjusting the learning rate based on the value of temporal difference error in a way that ensures the algorithm lies somewhere between average-based learning and maximum-based learning. We compare Smooth Q-Learning against different algorithms from the literature: Decentralized Q-learning, Distributed Q-Learning, Hysteretic Q-Learning, and a recent version of Lenient Q-Learning called Lenient Multiagent Reinforcement learning 2. The results show that Smooth Q-Learning is very effective in the sense that it has the highest number of convergent trials. Unlike competing algorithms, Smooth Q-Learning is also easy to tune and does not require storing additional information.
ISSN:	0921-0296 1573-0409
DOI:	10.1007/s10846-023-01917-z