Loading…

Risk-Sensitive RL with Optimized Certainty Equivalents via Reduction to Standard RL

We study Risk-Sensitive Reinforcement Learning (RSRL) with the Optimized Certainty Equivalent (OCE) risk, which generalizes Conditional Value-at-risk (CVaR), entropic risk and Markowitz's mean-variance. Using an augmented Markov Decision Process (MDP), we propose two general meta-algorithms via...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2024-03
Main Authors:	Wang, Kaiwen, Liang, Dawen, Kallus, Nathan, Sun, Wen
Format:	Article
Language:	English
Subjects:	Algorithms Decision theory Equivalence Lower bounds Markov processes Optimization Risk
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	We study Risk-Sensitive Reinforcement Learning (RSRL) with the Optimized Certainty Equivalent (OCE) risk, which generalizes Conditional Value-at-risk (CVaR), entropic risk and Markowitz's mean-variance. Using an augmented Markov Decision Process (MDP), we propose two general meta-algorithms via reductions to standard RL: one based on optimistic algorithms and another based on policy optimization. Our optimistic meta-algorithm generalizes almost all prior RSRL theory with entropic risk or CVaR. Under discrete rewards, our optimistic theory also certifies the first RSRL regret bounds for MDPs with bounded coverability, e.g., exogenous block MDPs. Under discrete rewards, our policy optimization meta-algorithm enjoys both global convergence and local improvement guarantees in a novel metric that lower bounds the true OCE risk. Finally, we instantiate our framework with PPO, construct an MDP, and show that it learns the optimal risk-sensitive policy while prior algorithms provably fail.
ISSN:	2331-8422