Loading…

A Deep Reinforcement Learning-Based Energy Management Framework With Lagrangian Relaxation for Plug-In Hybrid Electric Vehicle

Reinforcement learning (RL)-based energy management is one of the current hot spots of hybrid electric vehicles. Recent advances in RL-based energy management focus on energy-saving performance but less considers the constrained setting for training safety. This article proposes an RL framework name...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on transportation electrification 2021-09, Vol.7 (3), p.1146-1160
Main Authors:	Zhang, Hailong, Peng, Jiankun, Tan, Huachun, Dong, Hanxuan, Ding, Fan
Format:	Article
Language:	English
Subjects:	Deep learning Electric vehicles Energy consumption Energy management Engines Hybrid electric vehicles Lagrangian relaxation Markov processes Neural networks Optimization plug-in hybrid electric vehicle (PHEV) Reinforcement learning reinforcement learning (RL) Safety Safety management Training training safety Transportation
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Reinforcement learning (RL)-based energy management is one of the current hot spots of hybrid electric vehicles. Recent advances in RL-based energy management focus on energy-saving performance but less considers the constrained setting for training safety. This article proposes an RL framework named coach-actor-double-critic (CADC) for the optimization of energy management considered as the constrained Markov decision process (CMDP). A bilevel onboard controller includes a neural network (NN)-based strategy actor and rule-based strategy coach for online energy management. Once the output of the actor exceeds the constrained range of feasible solutions, the coach would take charge of energy management to ensure safety. By using the Lagrangian relaxation, the optimization for CMDP transforms into an unconstrained dual problem to minimize the energy consumption while minimizing the coach participation. The parameters of the actor are updated in a manner of policy gradient through RL training with the Lagrangian value function. Double-critic with the same structure synchronously estimates the value function to avoid overestimate bias. Several experiments with the bus trajectories data demonstrate the optimality, self-learning ability, and adaptability of CADC. The results indicate that CADC outperforms the existing RL-based strategies and reaches above 95% energy-saving rate of the off-line global optimum.
ISSN:	2332-7782 2577-4212 2332-7782
DOI:	10.1109/TTE.2020.3043239