Reward tampering problems and solutions in reinforcement learning: a causal influence diagram perspective
Can humans get arbitrarily capable reinforcement learning (RL) agents to do their bidding? Or will sufficiently capable RL agents always find ways to bypass their intended objectives by shortcutting their reward signal? This question impacts how far RL can be scaled, and whether alternative paradigm...
Saved in:
| Published in: | Synthese (Dordrecht) 2021-11, Vol.198 (Suppl 27), p.S6435-S6467 |
|---|---|
| Main Authors: | , , , |
| Format: | Article |
| Language: | English |
| Subjects: | |
| Citations: | Items that this one cites Items that cite this one |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|