Loading…

Adaptive Multi-Agent Deep Mixed Reinforcement Learning for Traffic Light Control

Despite significant advancements in Multi-Agent Deep Reinforcement Learning (MADRL) approaches for Traffic Light Control (TLC), effectively coordinating agents in diverse traffic environments remains a challenge. Studies in MADRL for TLC often focus on repeatedly constructing the same intersection m...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on vehicular technology 2024-02, Vol.73 (2), p.1803-1816
Main Authors: Li, Lulu, Zhu, Ruijie, Wu, Shuning, Ding, Wenting, Xu, Mingliang, Lu, Jiwen
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Despite significant advancements in Multi-Agent Deep Reinforcement Learning (MADRL) approaches for Traffic Light Control (TLC), effectively coordinating agents in diverse traffic environments remains a challenge. Studies in MADRL for TLC often focus on repeatedly constructing the same intersection models with sparse experience. However, real road networks comprise Multi-Type of Intersections (MTIs) rather than being limited to intersections with four directions. In the scenario with MTIs, each type of intersection exhibits a distinctive topology structure and phase set, leading to disparities in the spaces of state and action. This article introduces Adaptive Multi-agent Deep Mixed Reinforcement Learning (AMDMRL) for addressing tasks with multiple types of intersections in TLC. AMDMRL adopts a two-level hierarchy, where high-level proxies guide low-level agents in decision-making and updating. All proxies are updated by value decomposition to obtain the globally optimal policy. Moreover, the AMDMRL approach incorporates a mixed cooperative mechanism to enhance cooperation among agents, which adopts a mixed encoder to aggregate the information from correlated agents. We conduct comparative experiments involving four traditional and four DRL-based approaches, utilizing three training and four testing datasets. The results indicate that the AMDMRL approach achieves average reductions of 41% than traditional approaches, and 16% compared to DRL-based approaches in traveling time on three training datasets. During testing, the AMDMRL approach exhibits a 37% improvement in reward compared to the MADRL-based approaches.
ISSN:0018-9545
1939-9359
DOI:10.1109/TVT.2023.3319698