Loading…

Dealing With Non-stationarity in Decentralized Cooperative Multi-Agent Deep Reinforcement Learning via Multi-Timescale Learning

Decentralized cooperative multi-agent deep reinforcement learning (MARL) can be a versatile learning framework, particularly in scenarios where centralized training is either not possible or not practical. One of the critical challenges in decentralized deep MARL is the non-stationarity of the learn...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2023-08
Main Authors: Nekoei, Hadi, Akilesh Badrinaaraayanan, Sinha, Amit, Amini, Mohammad, Janarthanan Rajendran, Mahajan, Aditya, Chandar, Sarath
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Nekoei, Hadi
Akilesh Badrinaaraayanan
Sinha, Amit
Amini, Mohammad
Janarthanan Rajendran
Mahajan, Aditya
Chandar, Sarath
description Decentralized cooperative multi-agent deep reinforcement learning (MARL) can be a versatile learning framework, particularly in scenarios where centralized training is either not possible or not practical. One of the critical challenges in decentralized deep MARL is the non-stationarity of the learning environment when multiple agents are learning concurrently. A commonly used and efficient scheme for decentralized MARL is independent learning in which agents concurrently update their policies independently of each other. We first show that independent learning does not always converge, while sequential learning where agents update their policies one after another in a sequence is guaranteed to converge to an agent-by-agent optimal solution. In sequential learning, when one agent updates its policy, all other agent's policies are kept fixed, alleviating the challenge of non-stationarity due to simultaneous updates in other agents' policies. However, it can be slow because only one agent is learning at any time. Therefore it might also not always be practical. In this work, we propose a decentralized cooperative MARL algorithm based on multi-timescale learning. In multi-timescale learning, all agents learn simultaneously, but at different learning rates. In our proposed method, when one agent updates its policy, other agents are allowed to update their policies as well, but at a slower rate. This speeds up sequential learning, while also minimizing non-stationarity caused by other agents updating concurrently. Multi-timescale learning outperforms state-of-the-art decentralized learning methods on a set of challenging multi-agent cooperative tasks in the epymarl(Papoudakis et al., 2020) benchmark. This can be seen as a first step towards more general decentralized cooperative deep MARL methods based on multi-timescale learning.
format article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2774004483</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2774004483</sourcerecordid><originalsourceid>FETCH-proquest_journals_27740044833</originalsourceid><addsrcrecordid>eNqNzL0KwjAUhuEgCBb1HgLOhZhU21X8wUEdRHAsoZ7WU9qkJmlBF2_dFMTZ6cD3PpwBCbgQ8zCJOB-RqbUlY4wvY75YiIC8NyArVAW9orvTk1ahddKhVtKge1JUdAMZKGe8esGNrrVuwHjRAT22lcNwVfjsFTT0DKhybTKo--kA0qj-dYfyay9Yg81kBb84IcNcVham3zsms932st6HjdGPFqxLS90a5VPK4zhiLIoSIf5TH_VOUKA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2774004483</pqid></control><display><type>article</type><title>Dealing With Non-stationarity in Decentralized Cooperative Multi-Agent Deep Reinforcement Learning via Multi-Timescale Learning</title><source>Publicly Available Content Database (Proquest) (PQ_SDU_P3)</source><creator>Nekoei, Hadi ; Akilesh Badrinaaraayanan ; Sinha, Amit ; Amini, Mohammad ; Janarthanan Rajendran ; Mahajan, Aditya ; Chandar, Sarath</creator><creatorcontrib>Nekoei, Hadi ; Akilesh Badrinaaraayanan ; Sinha, Amit ; Amini, Mohammad ; Janarthanan Rajendran ; Mahajan, Aditya ; Chandar, Sarath</creatorcontrib><description>Decentralized cooperative multi-agent deep reinforcement learning (MARL) can be a versatile learning framework, particularly in scenarios where centralized training is either not possible or not practical. One of the critical challenges in decentralized deep MARL is the non-stationarity of the learning environment when multiple agents are learning concurrently. A commonly used and efficient scheme for decentralized MARL is independent learning in which agents concurrently update their policies independently of each other. We first show that independent learning does not always converge, while sequential learning where agents update their policies one after another in a sequence is guaranteed to converge to an agent-by-agent optimal solution. In sequential learning, when one agent updates its policy, all other agent's policies are kept fixed, alleviating the challenge of non-stationarity due to simultaneous updates in other agents' policies. However, it can be slow because only one agent is learning at any time. Therefore it might also not always be practical. In this work, we propose a decentralized cooperative MARL algorithm based on multi-timescale learning. In multi-timescale learning, all agents learn simultaneously, but at different learning rates. In our proposed method, when one agent updates its policy, other agents are allowed to update their policies as well, but at a slower rate. This speeds up sequential learning, while also minimizing non-stationarity caused by other agents updating concurrently. Multi-timescale learning outperforms state-of-the-art decentralized learning methods on a set of challenging multi-agent cooperative tasks in the epymarl(Papoudakis et al., 2020) benchmark. This can be seen as a first step towards more general decentralized cooperative deep MARL methods based on multi-timescale learning.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Convergence ; Deep learning ; Machine learning ; Multiagent systems ; Policies ; Time</subject><ispartof>arXiv.org, 2023-08</ispartof><rights>2023. This work is published under http://creativecommons.org/publicdomain/zero/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2774004483?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>Nekoei, Hadi</creatorcontrib><creatorcontrib>Akilesh Badrinaaraayanan</creatorcontrib><creatorcontrib>Sinha, Amit</creatorcontrib><creatorcontrib>Amini, Mohammad</creatorcontrib><creatorcontrib>Janarthanan Rajendran</creatorcontrib><creatorcontrib>Mahajan, Aditya</creatorcontrib><creatorcontrib>Chandar, Sarath</creatorcontrib><title>Dealing With Non-stationarity in Decentralized Cooperative Multi-Agent Deep Reinforcement Learning via Multi-Timescale Learning</title><title>arXiv.org</title><description>Decentralized cooperative multi-agent deep reinforcement learning (MARL) can be a versatile learning framework, particularly in scenarios where centralized training is either not possible or not practical. One of the critical challenges in decentralized deep MARL is the non-stationarity of the learning environment when multiple agents are learning concurrently. A commonly used and efficient scheme for decentralized MARL is independent learning in which agents concurrently update their policies independently of each other. We first show that independent learning does not always converge, while sequential learning where agents update their policies one after another in a sequence is guaranteed to converge to an agent-by-agent optimal solution. In sequential learning, when one agent updates its policy, all other agent's policies are kept fixed, alleviating the challenge of non-stationarity due to simultaneous updates in other agents' policies. However, it can be slow because only one agent is learning at any time. Therefore it might also not always be practical. In this work, we propose a decentralized cooperative MARL algorithm based on multi-timescale learning. In multi-timescale learning, all agents learn simultaneously, but at different learning rates. In our proposed method, when one agent updates its policy, other agents are allowed to update their policies as well, but at a slower rate. This speeds up sequential learning, while also minimizing non-stationarity caused by other agents updating concurrently. Multi-timescale learning outperforms state-of-the-art decentralized learning methods on a set of challenging multi-agent cooperative tasks in the epymarl(Papoudakis et al., 2020) benchmark. This can be seen as a first step towards more general decentralized cooperative deep MARL methods based on multi-timescale learning.</description><subject>Algorithms</subject><subject>Convergence</subject><subject>Deep learning</subject><subject>Machine learning</subject><subject>Multiagent systems</subject><subject>Policies</subject><subject>Time</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNzL0KwjAUhuEgCBb1HgLOhZhU21X8wUEdRHAsoZ7WU9qkJmlBF2_dFMTZ6cD3PpwBCbgQ8zCJOB-RqbUlY4wvY75YiIC8NyArVAW9orvTk1ahddKhVtKge1JUdAMZKGe8esGNrrVuwHjRAT22lcNwVfjsFTT0DKhybTKo--kA0qj-dYfyay9Yg81kBb84IcNcVham3zsms932st6HjdGPFqxLS90a5VPK4zhiLIoSIf5TH_VOUKA</recordid><startdate>20230817</startdate><enddate>20230817</enddate><creator>Nekoei, Hadi</creator><creator>Akilesh Badrinaaraayanan</creator><creator>Sinha, Amit</creator><creator>Amini, Mohammad</creator><creator>Janarthanan Rajendran</creator><creator>Mahajan, Aditya</creator><creator>Chandar, Sarath</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PTHSS</scope></search><sort><creationdate>20230817</creationdate><title>Dealing With Non-stationarity in Decentralized Cooperative Multi-Agent Deep Reinforcement Learning via Multi-Timescale Learning</title><author>Nekoei, Hadi ; Akilesh Badrinaaraayanan ; Sinha, Amit ; Amini, Mohammad ; Janarthanan Rajendran ; Mahajan, Aditya ; Chandar, Sarath</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_27740044833</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Convergence</topic><topic>Deep learning</topic><topic>Machine learning</topic><topic>Multiagent systems</topic><topic>Policies</topic><topic>Time</topic><toplevel>online_resources</toplevel><creatorcontrib>Nekoei, Hadi</creatorcontrib><creatorcontrib>Akilesh Badrinaaraayanan</creatorcontrib><creatorcontrib>Sinha, Amit</creatorcontrib><creatorcontrib>Amini, Mohammad</creatorcontrib><creatorcontrib>Janarthanan Rajendran</creatorcontrib><creatorcontrib>Mahajan, Aditya</creatorcontrib><creatorcontrib>Chandar, Sarath</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database (Proquest) (PQ_SDU_P3)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Nekoei, Hadi</au><au>Akilesh Badrinaaraayanan</au><au>Sinha, Amit</au><au>Amini, Mohammad</au><au>Janarthanan Rajendran</au><au>Mahajan, Aditya</au><au>Chandar, Sarath</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Dealing With Non-stationarity in Decentralized Cooperative Multi-Agent Deep Reinforcement Learning via Multi-Timescale Learning</atitle><jtitle>arXiv.org</jtitle><date>2023-08-17</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>Decentralized cooperative multi-agent deep reinforcement learning (MARL) can be a versatile learning framework, particularly in scenarios where centralized training is either not possible or not practical. One of the critical challenges in decentralized deep MARL is the non-stationarity of the learning environment when multiple agents are learning concurrently. A commonly used and efficient scheme for decentralized MARL is independent learning in which agents concurrently update their policies independently of each other. We first show that independent learning does not always converge, while sequential learning where agents update their policies one after another in a sequence is guaranteed to converge to an agent-by-agent optimal solution. In sequential learning, when one agent updates its policy, all other agent's policies are kept fixed, alleviating the challenge of non-stationarity due to simultaneous updates in other agents' policies. However, it can be slow because only one agent is learning at any time. Therefore it might also not always be practical. In this work, we propose a decentralized cooperative MARL algorithm based on multi-timescale learning. In multi-timescale learning, all agents learn simultaneously, but at different learning rates. In our proposed method, when one agent updates its policy, other agents are allowed to update their policies as well, but at a slower rate. This speeds up sequential learning, while also minimizing non-stationarity caused by other agents updating concurrently. Multi-timescale learning outperforms state-of-the-art decentralized learning methods on a set of challenging multi-agent cooperative tasks in the epymarl(Papoudakis et al., 2020) benchmark. This can be seen as a first step towards more general decentralized cooperative deep MARL methods based on multi-timescale learning.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2023-08
issn 2331-8422
language eng
recordid cdi_proquest_journals_2774004483
source Publicly Available Content Database (Proquest) (PQ_SDU_P3)
subjects Algorithms
Convergence
Deep learning
Machine learning
Multiagent systems
Policies
Time
title Dealing With Non-stationarity in Decentralized Cooperative Multi-Agent Deep Reinforcement Learning via Multi-Timescale Learning
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T13%3A10%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Dealing%20With%20Non-stationarity%20in%20Decentralized%20Cooperative%20Multi-Agent%20Deep%20Reinforcement%20Learning%20via%20Multi-Timescale%20Learning&rft.jtitle=arXiv.org&rft.au=Nekoei,%20Hadi&rft.date=2023-08-17&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2774004483%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_27740044833%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2774004483&rft_id=info:pmid/&rfr_iscdi=true