Loading…

Dynamic resource allocation during reinforcement learning accounts for ramping and phasic dopamine activity

For an animal to learn about its environment with limited motor and cognitive resources, it should focus its resources on potentially important stimuli. However, too narrow focus is disadvantageous for adaptation to environmental changes. Midbrain dopamine neurons are excited by potentially importan...

Full description

Saved in:

Bibliographic Details
Published in:	Neural networks 2020-06, Vol.126, p.95-107
Main Authors:	Song, Minryung R., Lee, Sang Wan
Format:	Article
Language:	English
Subjects:	Habit Pearce-Hall model Prediction error Salience Striatum Temporal-difference learning model
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	For an animal to learn about its environment with limited motor and cognitive resources, it should focus its resources on potentially important stimuli. However, too narrow focus is disadvantageous for adaptation to environmental changes. Midbrain dopamine neurons are excited by potentially important stimuli, such as reward-predicting or novel stimuli, and allocate resources to these stimuli by modulating how an animal approaches, exploits, explores, and attends. The current study examined the theoretical possibility that dopamine activity reflects the dynamic allocation of resources for learning. Dopamine activity may transition between two patterns: (1) phasic responses to cues and rewards, and (2) ramping activity arising as the agent approaches the reward. Phasic excitation has been explained by prediction errors generated by experimentally inserted cues. However, when and why dopamine activity transitions between the two patterns remain unknown. By parsimoniously modifying a standard temporal difference (TD) learning model to accommodate a mixed presentation of both experimental and environmental stimuli, we simulated dopamine transitions and compared them with experimental data from four different studies. The results suggested that dopamine transitions from ramping to phasic patterns as the agent focuses its resources on a small number of reward-predicting stimuli, thus leading to task dimensionality reduction. The opposite occurs when the agent re-distributes its resources to adapt to environmental changes, resulting in task dimensionality expansion. This research elucidates the role of dopamine in a broader context, providing a potential explanation for the diverse repertoire of dopamine activity that cannot be explained solely by prediction error.
ISSN:	0893-6080 1879-2782
DOI:	10.1016/j.neunet.2020.03.005