Loading…

Safe reinforcement learning for industrial optimal control: A case study from metallurgical industry

Gold cyanide leaching is a critical step in the extraction of gold from ore. The desire for a higher leaching rate often leads to increased cyanide concentrations, which pose safety risks and raise the cost of waste treatment. To address this problem, this study introduces a novel safe reinforcement...

Full description

Saved in:

Bibliographic Details
Published in:	Information sciences 2023-11, Vol.649, p.119684, Article 119684
Main Authors:	Zheng, Jun, Jia, Runda, Liu, Shaoning, He, Dakuo, Li, Kang, Wang, Fuli
Format:	Article
Language:	English
Subjects:	Augmented Lagrangian Control barrier function Gold cyanide leaching process Industrial optimal control Safe reinforcement learning
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Gold cyanide leaching is a critical step in the extraction of gold from ore. The desire for a higher leaching rate often leads to increased cyanide concentrations, which pose safety risks and raise the cost of waste treatment. To address this problem, this study introduces a novel safe reinforcement learning algorithm that satisfies joint chance constraints with a high probability for multi-constraint gold cyanide leaching processes. In particular, the proposed algorithm employs chance control barrier functions to maintain the state within the desired safe set with high probability and transforms the joint chance constraint into a cumulative cost form using a constraint relaxation method. This relaxation method guarantees the satisfaction of safety requirements within a specified time horizon. A surrogate objective function optimized by stochastic gradient ascent is derived to ensure monotonic improvement of the policy in the trust region. The augmented Lagrangian-based constrained policy optimization is utilized, converting the constrained optimization problem into an unconstrained saddle-point optimization problem and avoiding the periodic performance oscillations common in the general Lagrangian method. Case studies demonstrate that the proposed algorithm outperforms baseline algorithms in terms of policy improvement, and constraint satisfaction and operates safely in multi-constraint scenarios.
ISSN:	0020-0255 1872-6291
DOI:	10.1016/j.ins.2023.119684