Loading…
Dealing With Non-stationarity in Decentralized Cooperative Multi-Agent Deep Reinforcement Learning via Multi-Timescale Learning
Decentralized cooperative multi-agent deep reinforcement learning (MARL) can be a versatile learning framework, particularly in scenarios where centralized training is either not possible or not practical. One of the critical challenges in decentralized deep MARL is the non-stationarity of the learn...
Saved in:
Published in: | arXiv.org 2023-08 |
---|---|
Main Authors: | , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | |
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Nekoei, Hadi Akilesh Badrinaaraayanan Sinha, Amit Amini, Mohammad Janarthanan Rajendran Mahajan, Aditya Chandar, Sarath |
description | Decentralized cooperative multi-agent deep reinforcement learning (MARL) can be a versatile learning framework, particularly in scenarios where centralized training is either not possible or not practical. One of the critical challenges in decentralized deep MARL is the non-stationarity of the learning environment when multiple agents are learning concurrently. A commonly used and efficient scheme for decentralized MARL is independent learning in which agents concurrently update their policies independently of each other. We first show that independent learning does not always converge, while sequential learning where agents update their policies one after another in a sequence is guaranteed to converge to an agent-by-agent optimal solution. In sequential learning, when one agent updates its policy, all other agent's policies are kept fixed, alleviating the challenge of non-stationarity due to simultaneous updates in other agents' policies. However, it can be slow because only one agent is learning at any time. Therefore it might also not always be practical. In this work, we propose a decentralized cooperative MARL algorithm based on multi-timescale learning. In multi-timescale learning, all agents learn simultaneously, but at different learning rates. In our proposed method, when one agent updates its policy, other agents are allowed to update their policies as well, but at a slower rate. This speeds up sequential learning, while also minimizing non-stationarity caused by other agents updating concurrently. Multi-timescale learning outperforms state-of-the-art decentralized learning methods on a set of challenging multi-agent cooperative tasks in the epymarl(Papoudakis et al., 2020) benchmark. This can be seen as a first step towards more general decentralized cooperative deep MARL methods based on multi-timescale learning. |
format | article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2774004483</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2774004483</sourcerecordid><originalsourceid>FETCH-proquest_journals_27740044833</originalsourceid><addsrcrecordid>eNqNzL0KwjAUhuEgCBb1HgLOhZhU21X8wUEdRHAsoZ7WU9qkJmlBF2_dFMTZ6cD3PpwBCbgQ8zCJOB-RqbUlY4wvY75YiIC8NyArVAW9orvTk1ahddKhVtKge1JUdAMZKGe8esGNrrVuwHjRAT22lcNwVfjsFTT0DKhybTKo--kA0qj-dYfyay9Yg81kBb84IcNcVham3zsms932st6HjdGPFqxLS90a5VPK4zhiLIoSIf5TH_VOUKA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2774004483</pqid></control><display><type>article</type><title>Dealing With Non-stationarity in Decentralized Cooperative Multi-Agent Deep Reinforcement Learning via Multi-Timescale Learning</title><source>Publicly Available Content Database (Proquest) (PQ_SDU_P3)</source><creator>Nekoei, Hadi ; Akilesh Badrinaaraayanan ; Sinha, Amit ; Amini, Mohammad ; Janarthanan Rajendran ; Mahajan, Aditya ; Chandar, Sarath</creator><creatorcontrib>Nekoei, Hadi ; Akilesh Badrinaaraayanan ; Sinha, Amit ; Amini, Mohammad ; Janarthanan Rajendran ; Mahajan, Aditya ; Chandar, Sarath</creatorcontrib><description>Decentralized cooperative multi-agent deep reinforcement learning (MARL) can be a versatile learning framework, particularly in scenarios where centralized training is either not possible or not practical. One of the critical challenges in decentralized deep MARL is the non-stationarity of the learning environment when multiple agents are learning concurrently. A commonly used and efficient scheme for decentralized MARL is independent learning in which agents concurrently update their policies independently of each other. We first show that independent learning does not always converge, while sequential learning where agents update their policies one after another in a sequence is guaranteed to converge to an agent-by-agent optimal solution. In sequential learning, when one agent updates its policy, all other agent's policies are kept fixed, alleviating the challenge of non-stationarity due to simultaneous updates in other agents' policies. However, it can be slow because only one agent is learning at any time. Therefore it might also not always be practical. In this work, we propose a decentralized cooperative MARL algorithm based on multi-timescale learning. In multi-timescale learning, all agents learn simultaneously, but at different learning rates. In our proposed method, when one agent updates its policy, other agents are allowed to update their policies as well, but at a slower rate. This speeds up sequential learning, while also minimizing non-stationarity caused by other agents updating concurrently. Multi-timescale learning outperforms state-of-the-art decentralized learning methods on a set of challenging multi-agent cooperative tasks in the epymarl(Papoudakis et al., 2020) benchmark. This can be seen as a first step towards more general decentralized cooperative deep MARL methods based on multi-timescale learning.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Convergence ; Deep learning ; Machine learning ; Multiagent systems ; Policies ; Time</subject><ispartof>arXiv.org, 2023-08</ispartof><rights>2023. This work is published under http://creativecommons.org/publicdomain/zero/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2774004483?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25753,37012,44590</link.rule.ids></links><search><creatorcontrib>Nekoei, Hadi</creatorcontrib><creatorcontrib>Akilesh Badrinaaraayanan</creatorcontrib><creatorcontrib>Sinha, Amit</creatorcontrib><creatorcontrib>Amini, Mohammad</creatorcontrib><creatorcontrib>Janarthanan Rajendran</creatorcontrib><creatorcontrib>Mahajan, Aditya</creatorcontrib><creatorcontrib>Chandar, Sarath</creatorcontrib><title>Dealing With Non-stationarity in Decentralized Cooperative Multi-Agent Deep Reinforcement Learning via Multi-Timescale Learning</title><title>arXiv.org</title><description>Decentralized cooperative multi-agent deep reinforcement learning (MARL) can be a versatile learning framework, particularly in scenarios where centralized training is either not possible or not practical. One of the critical challenges in decentralized deep MARL is the non-stationarity of the learning environment when multiple agents are learning concurrently. A commonly used and efficient scheme for decentralized MARL is independent learning in which agents concurrently update their policies independently of each other. We first show that independent learning does not always converge, while sequential learning where agents update their policies one after another in a sequence is guaranteed to converge to an agent-by-agent optimal solution. In sequential learning, when one agent updates its policy, all other agent's policies are kept fixed, alleviating the challenge of non-stationarity due to simultaneous updates in other agents' policies. However, it can be slow because only one agent is learning at any time. Therefore it might also not always be practical. In this work, we propose a decentralized cooperative MARL algorithm based on multi-timescale learning. In multi-timescale learning, all agents learn simultaneously, but at different learning rates. In our proposed method, when one agent updates its policy, other agents are allowed to update their policies as well, but at a slower rate. This speeds up sequential learning, while also minimizing non-stationarity caused by other agents updating concurrently. Multi-timescale learning outperforms state-of-the-art decentralized learning methods on a set of challenging multi-agent cooperative tasks in the epymarl(Papoudakis et al., 2020) benchmark. This can be seen as a first step towards more general decentralized cooperative deep MARL methods based on multi-timescale learning.</description><subject>Algorithms</subject><subject>Convergence</subject><subject>Deep learning</subject><subject>Machine learning</subject><subject>Multiagent systems</subject><subject>Policies</subject><subject>Time</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNzL0KwjAUhuEgCBb1HgLOhZhU21X8wUEdRHAsoZ7WU9qkJmlBF2_dFMTZ6cD3PpwBCbgQ8zCJOB-RqbUlY4wvY75YiIC8NyArVAW9orvTk1ahddKhVtKge1JUdAMZKGe8esGNrrVuwHjRAT22lcNwVfjsFTT0DKhybTKo--kA0qj-dYfyay9Yg81kBb84IcNcVham3zsms932st6HjdGPFqxLS90a5VPK4zhiLIoSIf5TH_VOUKA</recordid><startdate>20230817</startdate><enddate>20230817</enddate><creator>Nekoei, Hadi</creator><creator>Akilesh Badrinaaraayanan</creator><creator>Sinha, Amit</creator><creator>Amini, Mohammad</creator><creator>Janarthanan Rajendran</creator><creator>Mahajan, Aditya</creator><creator>Chandar, Sarath</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PTHSS</scope></search><sort><creationdate>20230817</creationdate><title>Dealing With Non-stationarity in Decentralized Cooperative Multi-Agent Deep Reinforcement Learning via Multi-Timescale Learning</title><author>Nekoei, Hadi ; Akilesh Badrinaaraayanan ; Sinha, Amit ; Amini, Mohammad ; Janarthanan Rajendran ; Mahajan, Aditya ; Chandar, Sarath</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_27740044833</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Convergence</topic><topic>Deep learning</topic><topic>Machine learning</topic><topic>Multiagent systems</topic><topic>Policies</topic><topic>Time</topic><toplevel>online_resources</toplevel><creatorcontrib>Nekoei, Hadi</creatorcontrib><creatorcontrib>Akilesh Badrinaaraayanan</creatorcontrib><creatorcontrib>Sinha, Amit</creatorcontrib><creatorcontrib>Amini, Mohammad</creatorcontrib><creatorcontrib>Janarthanan Rajendran</creatorcontrib><creatorcontrib>Mahajan, Aditya</creatorcontrib><creatorcontrib>Chandar, Sarath</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database (Proquest) (PQ_SDU_P3)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Nekoei, Hadi</au><au>Akilesh Badrinaaraayanan</au><au>Sinha, Amit</au><au>Amini, Mohammad</au><au>Janarthanan Rajendran</au><au>Mahajan, Aditya</au><au>Chandar, Sarath</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Dealing With Non-stationarity in Decentralized Cooperative Multi-Agent Deep Reinforcement Learning via Multi-Timescale Learning</atitle><jtitle>arXiv.org</jtitle><date>2023-08-17</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>Decentralized cooperative multi-agent deep reinforcement learning (MARL) can be a versatile learning framework, particularly in scenarios where centralized training is either not possible or not practical. One of the critical challenges in decentralized deep MARL is the non-stationarity of the learning environment when multiple agents are learning concurrently. A commonly used and efficient scheme for decentralized MARL is independent learning in which agents concurrently update their policies independently of each other. We first show that independent learning does not always converge, while sequential learning where agents update their policies one after another in a sequence is guaranteed to converge to an agent-by-agent optimal solution. In sequential learning, when one agent updates its policy, all other agent's policies are kept fixed, alleviating the challenge of non-stationarity due to simultaneous updates in other agents' policies. However, it can be slow because only one agent is learning at any time. Therefore it might also not always be practical. In this work, we propose a decentralized cooperative MARL algorithm based on multi-timescale learning. In multi-timescale learning, all agents learn simultaneously, but at different learning rates. In our proposed method, when one agent updates its policy, other agents are allowed to update their policies as well, but at a slower rate. This speeds up sequential learning, while also minimizing non-stationarity caused by other agents updating concurrently. Multi-timescale learning outperforms state-of-the-art decentralized learning methods on a set of challenging multi-agent cooperative tasks in the epymarl(Papoudakis et al., 2020) benchmark. This can be seen as a first step towards more general decentralized cooperative deep MARL methods based on multi-timescale learning.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2023-08 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_2774004483 |
source | Publicly Available Content Database (Proquest) (PQ_SDU_P3) |
subjects | Algorithms Convergence Deep learning Machine learning Multiagent systems Policies Time |
title | Dealing With Non-stationarity in Decentralized Cooperative Multi-Agent Deep Reinforcement Learning via Multi-Timescale Learning |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T13%3A10%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Dealing%20With%20Non-stationarity%20in%20Decentralized%20Cooperative%20Multi-Agent%20Deep%20Reinforcement%20Learning%20via%20Multi-Timescale%20Learning&rft.jtitle=arXiv.org&rft.au=Nekoei,%20Hadi&rft.date=2023-08-17&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2774004483%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_27740044833%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2774004483&rft_id=info:pmid/&rfr_iscdi=true |