Loading…
Statistical information about reward timing is insufficient for promoting optimal persistence decisions
When deciding how long to keep waiting for delayed rewards that will arrive at an uncertain time, different distributions of possible reward times dictate different optimal strategies for maximizing reward. When reward timing distributions are heavy-tailed (e.g., waiting on hold) there is a point at...
Saved in:
Published in: | Cognition 2023-08, Vol.237, p.105468-105468, Article 105468 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | When deciding how long to keep waiting for delayed rewards that will arrive at an uncertain time, different distributions of possible reward times dictate different optimal strategies for maximizing reward. When reward timing distributions are heavy-tailed (e.g., waiting on hold) there is a point at which waiting is no longer advantageous because the opportunity cost of waiting is too high. Alternatively, when reward timing distributions have more predictable timing (e.g., uniform), it is advantageous to wait as long as necessary for the reward. Although people learn to approximate optimal strategies, little is known about how this learning occurs. One possibility is that people learn a general cognitive representation of the probability distribution that governs reward timing and then infer a strategy from that model of the environment. Another possibility is that they learn an action policy in a way that depends more narrowly on direct task experience, such that general knowledge of the reward timing distribution is insufficient for expressing the optimal strategy. Here, in a series of studies in which participants decided how long to persist for delayed rewards before quitting, we provided participants with information about the reward timing distribution in several ways. Whether the information was provided through counterfactual feedback (Study 1), previous exposure (Studies 2a and 2b), or description (Studies 3a and 3b), it did not obviate the need for direct, feedback-driven learning in a decision context. Therefore, learning when to quit waiting for delayed rewards might depend on task-specific experience, not solely on probabilistic reasoning. |
---|---|
ISSN: | 0010-0277 1873-7838 1873-7838 |
DOI: | 10.1016/j.cognition.2023.105468 |