Reward tampering problems and solutions in reinforcement learning: a causal influence diagram perspective

Can humans get arbitrarily capable reinforcement learning (RL) agents to do their bidding? Or will sufficiently capable RL agents always find ways to bypass their intended objectives by shortcutting their reward signal? This question impacts how far RL can be scaled, and whether alternative paradigm...

Full description

Saved in:
Bibliographic Details
Published in:Synthese (Dordrecht) 2021-11, Vol.198 (Suppl 27), p.S6435-S6467
Main Authors: Everitt, Tom, Hutter, Marcus, Kumar, Ramana, Krakovna, Victoria
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!