Loading…

Design and calibration of a DRL algorithm for solving the job shop scheduling problem under unexpected job arrivals

This paper proposes a Deep Reinforcement Learning(DRL)—based approach to solve the real-time Job Shop Scheduling Problem (JSSP) facing unexpected job arrivals. The approach combines the use of a DRL algorithm, Proximal Policy Optimization Actor and Critic (PPO-AC) algorithm, with an event-driven res...

Full description

Saved in:

Bibliographic Details
Published in:	Flexible services and manufacturing journal 2024-05
Main Authors:	Hammami, Nour El Houda, Lardeux, Benoit, B. Hadj-Alouane, Atidel, Jridi, Maher
Format:	Article
Language:	English
Subjects:	Computer Science
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This paper proposes a Deep Reinforcement Learning(DRL)—based approach to solve the real-time Job Shop Scheduling Problem (JSSP) facing unexpected job arrivals. The approach combines the use of a DRL algorithm, Proximal Policy Optimization Actor and Critic (PPO-AC) algorithm, with an event-driven rescheduling strategy for solving a bi-objective decision problem. PPO-AC models an agent in interaction with its environment, aiming to achieve a predefned goal by maximizing the total cumulative reward. In this work, the total cumulative reward is designed as the opposite of the optimization objective function, which is expressed as the weighted sum of generated schedule completion time (efciency criterion), and deviation from an initially generated schedule (stability criterion). The agent minimizes the objective function by maximizing the total cumulative reward. To the best of our knowledge, no prior work focused on scheduling stability while using DRL algorithms. Graph Neural Network (GNN) architecture is exploited to model environment states, enhancing the approach’s adaptability. Training experiments are conducted to calibrate the algorithm. A sensitivity analysis is conducted on the deviation weight parameter to evaluate its variation impact on the proposed model. Results indicate that the proposed model is robust to such variation. For a fxed deviation weight value, the algorithm is compared to CP-optimizer, IBM constraint programming method, and a Mixed Integer Program, to assess its performance. Results reveal that for arriving small-job batches, PPO-AC succeeds in solving the problem in real-time with low gaps to the optimal solution.
ISSN:	1936-6582 1936-6590
DOI:	10.1007/s10696-024-09540-2