Loading…

Train timetabling with the general learning environment and multi-agent deep reinforcement learning

•A novel multi-agent deep reinforcement learning method for solving the train timetabling problem.•A general environment that captures the system dynamics of the single-track and double-track railway systems.•A multi-agent actor-critic algorithm framework of centralized training and decentralized ex...

Full description

Saved in:
Bibliographic Details
Published in:Transportation research. Part B: methodological 2022-03, Vol.157, p.230-251
Main Authors: Li, Wenqing, Ni, Shaoquan
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•A novel multi-agent deep reinforcement learning method for solving the train timetabling problem.•A general environment that captures the system dynamics of the single-track and double-track railway systems.•A multi-agent actor-critic algorithm framework of centralized training and decentralized execution.•A case study demonstrating advantages of the proposed method over traditional counterparts . This paper proposes a multi-agent deep reinforcement learning approach for the train timetabling problem of different railway systems. A general train timetabling learning environment is constructed to model the problem as a Markov decision process, in which the objectives and complex constraints of the problem can be distributed naturally and elegantly. Through subtle changes, the environment can be flexibly switched between the widely used double-track railway system and the more complex single-track railway system. To address the curse of dimensionality, a multi-agent actor–critic algorithm framework is proposed to decompose the large-size combinatorial decision space into multiple independent ones, which are parameterized by deep neural networks. The proposed approach was tested using a real-world instance and several test instances. Experimental results show that cooperative policies of the single-track train timetabling problem can be obtained by the proposed method within a reasonable computing time that outperforms several prevailing methods in terms of the optimality of solutions, and the proposed method can be easily generalized to the double-track train timetabling problem by changing the environment slightly.
ISSN:0191-2615
1879-2367
DOI:10.1016/j.trb.2022.02.006