Loading…

Altered Statistical Learning and Decision-Making in Methamphetamine Dependence: Evidence from a Two-Armed Bandit Task

Understanding how humans weigh long-term and short-term goals is important for both basic cognitive science and clinical neuroscience, as substance users need to balance the appeal of an immediate high vs. the long-term goal of sobriety. We use a computational model to identify learning and decision...

Full description

Saved in:
Bibliographic Details
Published in:Frontiers in psychology 2015-12, Vol.6, p.1910-1910
Main Authors: Harlé, Katia M, Zhang, Shunan, Schiff, Max, Mackey, Scott, Paulus, Martin P, Yu, Angela J
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c528t-9865562b6679de4407ac4f4bc5ad9a4e419bdad39b200e95d1c6dd9fe8180e783
cites cdi_FETCH-LOGICAL-c528t-9865562b6679de4407ac4f4bc5ad9a4e419bdad39b200e95d1c6dd9fe8180e783
container_end_page 1910
container_issue
container_start_page 1910
container_title Frontiers in psychology
container_volume 6
creator Harlé, Katia M
Zhang, Shunan
Schiff, Max
Mackey, Scott
Paulus, Martin P
Yu, Angela J
description Understanding how humans weigh long-term and short-term goals is important for both basic cognitive science and clinical neuroscience, as substance users need to balance the appeal of an immediate high vs. the long-term goal of sobriety. We use a computational model to identify learning and decision-making abnormalities in methamphetamine-dependent individuals (MDI, n = 16) vs. healthy control subjects (HCS, n = 16), in a two-armed bandit task. In this task, subjects repeatedly choose between two arms with fixed but unknown reward rates. Each choice not only yields potential immediate reward but also information useful for long-term reward accumulation, thus pitting exploration against exploitation. We formalize the task as comprising a learning component, the updating of estimated reward rates based on ongoing observations, and a decision-making component, the choice among options based on current beliefs and uncertainties about reward rates. We model the learning component as iterative Bayesian inference (the Dynamic Belief Model), and the decision component using five competing decision policies: Win-stay/Lose-shift (WSLS), ε-Greedy, τ-Switch, Softmax, Knowledge Gradient. HCS and MDI significantly differ in how they learn about reward rates and use them to make decisions. HCS learn from past observations but weigh recent data more, and their decision policy is best fit as Softmax. MDI are more likely to follow the simple learning-independent policy of WSLS, and among MDI best fit by Softmax, they have more pessimistic prior beliefs about reward rates and are less likely to choose the option estimated to be most rewarding. Neurally, MDI's tendency to avoid the most rewarding option is associated with a lower gray matter volume of the thalamic dorsal lateral nucleus. More broadly, our work illustrates the ability of our computational framework to help reveal subtle learning and decision-making abnormalities in substance use.
doi_str_mv 10.3389/fpsyg.2015.01910
format article
fullrecord <record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_51e52be5c35049dd8f26ebeaaa5cc07e</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><doaj_id>oai_doaj_org_article_51e52be5c35049dd8f26ebeaaa5cc07e</doaj_id><sourcerecordid>1754524515</sourcerecordid><originalsourceid>FETCH-LOGICAL-c528t-9865562b6679de4407ac4f4bc5ad9a4e419bdad39b200e95d1c6dd9fe8180e783</originalsourceid><addsrcrecordid>eNpVkUFv1DAQhSMEolXpnRPKkUsWO7GdmAPS0haotBUHlrM1sSe7bhM72N6i_nu8u6VqffFo_OYbP72ieE_Jomk6-WmY48NmURPKF4RKSl4Vp1QIVlHSdq-f1SfFeYy3JB9GakLqt8VJLdqmkUScFrvlmDCgKX8lSDYmq2EsVwjBWbcpwZnyErWN1rvqBu72PevKG0xbmOYtJpiswyyZ0Rl0Gj-XV_f2UJVD8FMJ5fqvr5Zhyhu-ZppN5Rri3bvizQBjxPPH-6z4_e1qffGjWv38fn2xXFWa112qZCc4F3UvRCsNMkZa0GxgveZgJDBkVPYGTCP77AslN1QLY-SAHe0Itl1zVlwfucbDrZqDnSA8KA9WHRo-bBSE7HlExSnyukeuG06YNKYbaoE9AgDXmrSYWV-OrHnXZzsaXQowvoC-fHF2qzb-XjHRNTmfDPj4CAj-zw5jUpONGscRHPpdVLTljNeMU56l5CjVwccYcHhaQ4nah68O4at9-OoQfh758Px7TwP_o27-AZJ5rcQ</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1754524515</pqid></control><display><type>article</type><title>Altered Statistical Learning and Decision-Making in Methamphetamine Dependence: Evidence from a Two-Armed Bandit Task</title><source>Open Access: PubMed Central</source><creator>Harlé, Katia M ; Zhang, Shunan ; Schiff, Max ; Mackey, Scott ; Paulus, Martin P ; Yu, Angela J</creator><creatorcontrib>Harlé, Katia M ; Zhang, Shunan ; Schiff, Max ; Mackey, Scott ; Paulus, Martin P ; Yu, Angela J</creatorcontrib><description>Understanding how humans weigh long-term and short-term goals is important for both basic cognitive science and clinical neuroscience, as substance users need to balance the appeal of an immediate high vs. the long-term goal of sobriety. We use a computational model to identify learning and decision-making abnormalities in methamphetamine-dependent individuals (MDI, n = 16) vs. healthy control subjects (HCS, n = 16), in a two-armed bandit task. In this task, subjects repeatedly choose between two arms with fixed but unknown reward rates. Each choice not only yields potential immediate reward but also information useful for long-term reward accumulation, thus pitting exploration against exploitation. We formalize the task as comprising a learning component, the updating of estimated reward rates based on ongoing observations, and a decision-making component, the choice among options based on current beliefs and uncertainties about reward rates. We model the learning component as iterative Bayesian inference (the Dynamic Belief Model), and the decision component using five competing decision policies: Win-stay/Lose-shift (WSLS), ε-Greedy, τ-Switch, Softmax, Knowledge Gradient. HCS and MDI significantly differ in how they learn about reward rates and use them to make decisions. HCS learn from past observations but weigh recent data more, and their decision policy is best fit as Softmax. MDI are more likely to follow the simple learning-independent policy of WSLS, and among MDI best fit by Softmax, they have more pessimistic prior beliefs about reward rates and are less likely to choose the option estimated to be most rewarding. Neurally, MDI's tendency to avoid the most rewarding option is associated with a lower gray matter volume of the thalamic dorsal lateral nucleus. More broadly, our work illustrates the ability of our computational framework to help reveal subtle learning and decision-making abnormalities in substance use.</description><identifier>ISSN: 1664-1078</identifier><identifier>EISSN: 1664-1078</identifier><identifier>DOI: 10.3389/fpsyg.2015.01910</identifier><identifier>PMID: 26733906</identifier><language>eng</language><publisher>Switzerland: Frontiers Media S.A</publisher><subject>Addiction ; Bayesian model ; decision-making ; methamphetamine stimulant ; multi-armed bandit task ; Psychology ; reward processing</subject><ispartof>Frontiers in psychology, 2015-12, Vol.6, p.1910-1910</ispartof><rights>Copyright © 2015 Harlé, Zhang, Schiff, Mackey, Paulus and Yu. 2015 Harlé, Zhang, Schiff, Mackey, Paulus and Yu</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c528t-9865562b6679de4407ac4f4bc5ad9a4e419bdad39b200e95d1c6dd9fe8180e783</citedby><cites>FETCH-LOGICAL-c528t-9865562b6679de4407ac4f4bc5ad9a4e419bdad39b200e95d1c6dd9fe8180e783</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC4683191/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC4683191/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,881,27903,27904,53769,53771</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/26733906$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Harlé, Katia M</creatorcontrib><creatorcontrib>Zhang, Shunan</creatorcontrib><creatorcontrib>Schiff, Max</creatorcontrib><creatorcontrib>Mackey, Scott</creatorcontrib><creatorcontrib>Paulus, Martin P</creatorcontrib><creatorcontrib>Yu, Angela J</creatorcontrib><title>Altered Statistical Learning and Decision-Making in Methamphetamine Dependence: Evidence from a Two-Armed Bandit Task</title><title>Frontiers in psychology</title><addtitle>Front Psychol</addtitle><description>Understanding how humans weigh long-term and short-term goals is important for both basic cognitive science and clinical neuroscience, as substance users need to balance the appeal of an immediate high vs. the long-term goal of sobriety. We use a computational model to identify learning and decision-making abnormalities in methamphetamine-dependent individuals (MDI, n = 16) vs. healthy control subjects (HCS, n = 16), in a two-armed bandit task. In this task, subjects repeatedly choose between two arms with fixed but unknown reward rates. Each choice not only yields potential immediate reward but also information useful for long-term reward accumulation, thus pitting exploration against exploitation. We formalize the task as comprising a learning component, the updating of estimated reward rates based on ongoing observations, and a decision-making component, the choice among options based on current beliefs and uncertainties about reward rates. We model the learning component as iterative Bayesian inference (the Dynamic Belief Model), and the decision component using five competing decision policies: Win-stay/Lose-shift (WSLS), ε-Greedy, τ-Switch, Softmax, Knowledge Gradient. HCS and MDI significantly differ in how they learn about reward rates and use them to make decisions. HCS learn from past observations but weigh recent data more, and their decision policy is best fit as Softmax. MDI are more likely to follow the simple learning-independent policy of WSLS, and among MDI best fit by Softmax, they have more pessimistic prior beliefs about reward rates and are less likely to choose the option estimated to be most rewarding. Neurally, MDI's tendency to avoid the most rewarding option is associated with a lower gray matter volume of the thalamic dorsal lateral nucleus. More broadly, our work illustrates the ability of our computational framework to help reveal subtle learning and decision-making abnormalities in substance use.</description><subject>Addiction</subject><subject>Bayesian model</subject><subject>decision-making</subject><subject>methamphetamine stimulant</subject><subject>multi-armed bandit task</subject><subject>Psychology</subject><subject>reward processing</subject><issn>1664-1078</issn><issn>1664-1078</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><sourceid>DOA</sourceid><recordid>eNpVkUFv1DAQhSMEolXpnRPKkUsWO7GdmAPS0haotBUHlrM1sSe7bhM72N6i_nu8u6VqffFo_OYbP72ieE_Jomk6-WmY48NmURPKF4RKSl4Vp1QIVlHSdq-f1SfFeYy3JB9GakLqt8VJLdqmkUScFrvlmDCgKX8lSDYmq2EsVwjBWbcpwZnyErWN1rvqBu72PevKG0xbmOYtJpiswyyZ0Rl0Gj-XV_f2UJVD8FMJ5fqvr5Zhyhu-ZppN5Rri3bvizQBjxPPH-6z4_e1qffGjWv38fn2xXFWa112qZCc4F3UvRCsNMkZa0GxgveZgJDBkVPYGTCP77AslN1QLY-SAHe0Itl1zVlwfucbDrZqDnSA8KA9WHRo-bBSE7HlExSnyukeuG06YNKYbaoE9AgDXmrSYWV-OrHnXZzsaXQowvoC-fHF2qzb-XjHRNTmfDPj4CAj-zw5jUpONGscRHPpdVLTljNeMU56l5CjVwccYcHhaQ4nah68O4at9-OoQfh758Px7TwP_o27-AZJ5rcQ</recordid><startdate>20151218</startdate><enddate>20151218</enddate><creator>Harlé, Katia M</creator><creator>Zhang, Shunan</creator><creator>Schiff, Max</creator><creator>Mackey, Scott</creator><creator>Paulus, Martin P</creator><creator>Yu, Angela J</creator><general>Frontiers Media S.A</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope></search><sort><creationdate>20151218</creationdate><title>Altered Statistical Learning and Decision-Making in Methamphetamine Dependence: Evidence from a Two-Armed Bandit Task</title><author>Harlé, Katia M ; Zhang, Shunan ; Schiff, Max ; Mackey, Scott ; Paulus, Martin P ; Yu, Angela J</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c528t-9865562b6679de4407ac4f4bc5ad9a4e419bdad39b200e95d1c6dd9fe8180e783</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Addiction</topic><topic>Bayesian model</topic><topic>decision-making</topic><topic>methamphetamine stimulant</topic><topic>multi-armed bandit task</topic><topic>Psychology</topic><topic>reward processing</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Harlé, Katia M</creatorcontrib><creatorcontrib>Zhang, Shunan</creatorcontrib><creatorcontrib>Schiff, Max</creatorcontrib><creatorcontrib>Mackey, Scott</creatorcontrib><creatorcontrib>Paulus, Martin P</creatorcontrib><creatorcontrib>Yu, Angela J</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>Open Access: DOAJ - Directory of Open Access Journals</collection><jtitle>Frontiers in psychology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Harlé, Katia M</au><au>Zhang, Shunan</au><au>Schiff, Max</au><au>Mackey, Scott</au><au>Paulus, Martin P</au><au>Yu, Angela J</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Altered Statistical Learning and Decision-Making in Methamphetamine Dependence: Evidence from a Two-Armed Bandit Task</atitle><jtitle>Frontiers in psychology</jtitle><addtitle>Front Psychol</addtitle><date>2015-12-18</date><risdate>2015</risdate><volume>6</volume><spage>1910</spage><epage>1910</epage><pages>1910-1910</pages><issn>1664-1078</issn><eissn>1664-1078</eissn><abstract>Understanding how humans weigh long-term and short-term goals is important for both basic cognitive science and clinical neuroscience, as substance users need to balance the appeal of an immediate high vs. the long-term goal of sobriety. We use a computational model to identify learning and decision-making abnormalities in methamphetamine-dependent individuals (MDI, n = 16) vs. healthy control subjects (HCS, n = 16), in a two-armed bandit task. In this task, subjects repeatedly choose between two arms with fixed but unknown reward rates. Each choice not only yields potential immediate reward but also information useful for long-term reward accumulation, thus pitting exploration against exploitation. We formalize the task as comprising a learning component, the updating of estimated reward rates based on ongoing observations, and a decision-making component, the choice among options based on current beliefs and uncertainties about reward rates. We model the learning component as iterative Bayesian inference (the Dynamic Belief Model), and the decision component using five competing decision policies: Win-stay/Lose-shift (WSLS), ε-Greedy, τ-Switch, Softmax, Knowledge Gradient. HCS and MDI significantly differ in how they learn about reward rates and use them to make decisions. HCS learn from past observations but weigh recent data more, and their decision policy is best fit as Softmax. MDI are more likely to follow the simple learning-independent policy of WSLS, and among MDI best fit by Softmax, they have more pessimistic prior beliefs about reward rates and are less likely to choose the option estimated to be most rewarding. Neurally, MDI's tendency to avoid the most rewarding option is associated with a lower gray matter volume of the thalamic dorsal lateral nucleus. More broadly, our work illustrates the ability of our computational framework to help reveal subtle learning and decision-making abnormalities in substance use.</abstract><cop>Switzerland</cop><pub>Frontiers Media S.A</pub><pmid>26733906</pmid><doi>10.3389/fpsyg.2015.01910</doi><tpages>1</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1664-1078
ispartof Frontiers in psychology, 2015-12, Vol.6, p.1910-1910
issn 1664-1078
1664-1078
language eng
recordid cdi_doaj_primary_oai_doaj_org_article_51e52be5c35049dd8f26ebeaaa5cc07e
source Open Access: PubMed Central
subjects Addiction
Bayesian model
decision-making
methamphetamine stimulant
multi-armed bandit task
Psychology
reward processing
title Altered Statistical Learning and Decision-Making in Methamphetamine Dependence: Evidence from a Two-Armed Bandit Task
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T07%3A25%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Altered%20Statistical%20Learning%20and%20Decision-Making%20in%20Methamphetamine%20Dependence:%20Evidence%20from%20a%20Two-Armed%20Bandit%20Task&rft.jtitle=Frontiers%20in%20psychology&rft.au=Harl%C3%A9,%20Katia%20M&rft.date=2015-12-18&rft.volume=6&rft.spage=1910&rft.epage=1910&rft.pages=1910-1910&rft.issn=1664-1078&rft.eissn=1664-1078&rft_id=info:doi/10.3389/fpsyg.2015.01910&rft_dat=%3Cproquest_doaj_%3E1754524515%3C/proquest_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c528t-9865562b6679de4407ac4f4bc5ad9a4e419bdad39b200e95d1c6dd9fe8180e783%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1754524515&rft_id=info:pmid/26733906&rfr_iscdi=true