Reward tampering problems and solutions in reinforcement learning: a causal influence diagram perspective

Can humans get arbitrarily capable reinforcement learning (RL) agents to do their bidding? Or will sufficiently capable RL agents always find ways to bypass their intended objectives by shortcutting their reward signal? This question impacts how far RL can be scaled, and whether alternative paradigm...

Full description

Saved in:

Bibliographic Details
Published in:	Synthese (Dordrecht) 2021-11, Vol.198 (Suppl 27), p.S6435-S6467
Main Authors:	Everitt, Tom, Hutter, Marcus, Kumar, Ramana, Krakovna, Victoria
Format:	Article
Language:	English
Subjects:	Algorithms Causality Decision theory Education Epistemology Learning Logic Metaphysics Philosophy Philosophy of Language Philosophy of Science Reinforcement S.I.: DECTHEORY&FUTOFAI Social networks
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Staff View