Loading…

Using the proximal policy optimisation algorithm for solving the stochastic capacitated lot sizing problem

This paper studies the multi-item stochastic capacitated lot-sizing problem with stationary demand to minimise set-up, holding, and backorder costs. This is a common problem in the industry, concerning both inventory management and production planning. We study the applicability of the Proximal Poli...

Full description

Saved in:

Bibliographic Details
Published in:	International journal of production research 2023-03, Vol.61 (6), p.1955-1978
Main Authors:	van Hezewijk, Lotte, Dellaert, Nico, Van Woensel, Tom, Gademann, Noud
Format:	Article
Language:	English
Subjects:	Algorithms Capacitated lot sizing problem deep reinforcement learning Dynamic programming Inventory management Lot sizing Machine learning Markov processes multi-item Optimization Production planning proximal policy optimisation stochastic demand
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This paper studies the multi-item stochastic capacitated lot-sizing problem with stationary demand to minimise set-up, holding, and backorder costs. This is a common problem in the industry, concerning both inventory management and production planning. We study the applicability of the Proximal Policy Optimisation (PPO) algorithm in this problem, which is a type of Deep Reinforcement Learning (DRL). The problem is modelled as a Markov Decision Process (MDP), which can be solved to optimality in small problem instances by using Dynamic Programming. In these settings, we show that the performance of PPO approaches the optimal solution. For larger problem instances with an increasing number of products, solving to optimality is intractable, and we demonstrate that the PPO solution outperforms the benchmark solution. Several adjustments to the standard PPO algorithm are implemented to make it more scalable to larger problem instances. We show the linear growth in computation time for the algorithm, and present a method for explaining the outcomes of the algorithm. We suggest future research directions that could improve the scalability and explainability of the PPO algorithm.
ISSN:	0020-7543 1366-588X
DOI:	10.1080/00207543.2022.2056540