Loading…

Proximal policy optimization through a deep reinforcement learning framework for remedial action schemes of VSC-HVDC

•VSC-HVDC is firstly used in a goal-oriented control scheme with a deep reinforcement learning model, thereby achieve better total voltage regulation.•To improve training stability, proximal policy optimization is used for learning algorithms, which adopts the trust-region concept.•To configure the...

Full description

Saved in:
Bibliographic Details
Published in:International journal of electrical power & energy systems 2023-08, Vol.150, p.109117, Article 109117
Main Authors: Song, Sungyoon, Jung, Yungun, Jang, Gilsoo, Jung, Seungmin
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•VSC-HVDC is firstly used in a goal-oriented control scheme with a deep reinforcement learning model, thereby achieve better total voltage regulation.•To improve training stability, proximal policy optimization is used for learning algorithms, which adopts the trust-region concept.•To configure the reward functions, an iterative two parallel optimal power flow calculation are implemented in the learning framework.•Advanced remedial action schemes allow flexibility and redundancy even with unforeseen power change of renewable energy. A proximal policy optimization (PPO)-based back-to-back VSC-HVDC emergency control strategy based on multi-agent deep reinforcement learning (DRL) approach is proposed for use in an energy management system (EMS). In this scheme, an advanced DRL algorithm is proposed by implementing both PPO and a communication neural network for large power systems. The PPO modeled as intelligent agents with objective functions have shown a higher convergence performance than have existing DRL algorithms. Further, the model was demonstrated to effectively address voltage variances caused by the high penetration of renewable energy sources. By implementing PPO, the learning procedure is stabilized and made robust to continuous changes in network topology. To escalate the effectiveness of the proposed algorithm, a comprehensive case studies were conducted on an standard test systems and Korean power system considering variations in load and PV generation and a weak centralized communication environment. The results indicate that outstanding control performance and autonomously regulated bus voltage and line flows, thereby validating the effectiveness of the method.
ISSN:0142-0615
1879-3517
DOI:10.1016/j.ijepes.2023.109117