Loading…
Safe reinforcement learning for industrial optimal control: A case study from metallurgical industry
Gold cyanide leaching is a critical step in the extraction of gold from ore. The desire for a higher leaching rate often leads to increased cyanide concentrations, which pose safety risks and raise the cost of waste treatment. To address this problem, this study introduces a novel safe reinforcement...
Saved in:
Published in: | Information sciences 2023-11, Vol.649, p.119684, Article 119684 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Gold cyanide leaching is a critical step in the extraction of gold from ore. The desire for a higher leaching rate often leads to increased cyanide concentrations, which pose safety risks and raise the cost of waste treatment. To address this problem, this study introduces a novel safe reinforcement learning algorithm that satisfies joint chance constraints with a high probability for multi-constraint gold cyanide leaching processes. In particular, the proposed algorithm employs chance control barrier functions to maintain the state within the desired safe set with high probability and transforms the joint chance constraint into a cumulative cost form using a constraint relaxation method. This relaxation method guarantees the satisfaction of safety requirements within a specified time horizon. A surrogate objective function optimized by stochastic gradient ascent is derived to ensure monotonic improvement of the policy in the trust region. The augmented Lagrangian-based constrained policy optimization is utilized, converting the constrained optimization problem into an unconstrained saddle-point optimization problem and avoiding the periodic performance oscillations common in the general Lagrangian method. Case studies demonstrate that the proposed algorithm outperforms baseline algorithms in terms of policy improvement, and constraint satisfaction and operates safely in multi-constraint scenarios. |
---|---|
ISSN: | 0020-0255 1872-6291 |
DOI: | 10.1016/j.ins.2023.119684 |