Loading…

Modular production control using deep reinforcement learning: proximal policy optimization

EU regulations on CO 2 limits and the trend of individualization are pushing the automotive industry towards greater flexibility and robustness in production. One approach to address these challenges is modular production, where workstations are decoupled by automated guided vehicles, requiring new...

Full description

Saved in:

Bibliographic Details
Published in:	Journal of intelligent manufacturing 2021-12, Vol.32 (8), p.2335-2351
Main Authors:	Mayer, Sebastian, Classen, Tobias, Endisch, Christian
Format:	Article
Language:	English
Subjects:	Algorithms Automated guided vehicles Automobile industry Business and Management Carbon dioxide Control Deep learning Machine learning Machines Manufacturing Mechatronics Optimization Processes Production Production controls Robotics Work stations Workstations
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	EU regulations on CO 2 limits and the trend of individualization are pushing the automotive industry towards greater flexibility and robustness in production. One approach to address these challenges is modular production, where workstations are decoupled by automated guided vehicles, requiring new control concepts. Modular production control aims at throughput-optimal coordination of products, workstations, and vehicles. For this np-hard problem, conventional control approaches lack in computing efficiency, do not find optimal solutions, or are not generalizable. In contrast, Deep Reinforcement Learning offers powerful and generalizable algorithms, able to deal with varying environments and high complexity. One of these algorithms is Proximal Policy Optimization, which is used in this article to address modular production control. Experiments in several modular production control settings demonstrate stable, reliable, optimal, and generalizable learning behavior. The agent successfully adapts its strategies with respect to the given problem configuration. We explain how to get to this learning behavior, especially focusing on the agent’s action, state, and reward design.
ISSN:	0956-5515 1572-8145
DOI:	10.1007/s10845-021-01778-z