Loading…

Deep Reinforcement Learning of Marked Temporal Point Processes

In a wide variety of applications, humans interact with a complex environment by means of asynchronous stochastic discrete events in continuous time. Can we design online interventions that will help humans achieve certain goals in such asynchronous setting? In this paper, we address the above probl...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2018-11
Main Authors:	Upadhyay, Utkarsh, De, Abir, Gomez-Rodriguez, Manuel
Format:	Article
Language:	English
Subjects:	Feedback Marketing Neural networks Recurrent neural networks
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Upadhyay, Utkarsh De, Abir Gomez-Rodriguez, Manuel
description	In a wide variety of applications, humans interact with a complex environment by means of asynchronous stochastic discrete events in continuous time. Can we design online interventions that will help humans achieve certain goals in such asynchronous setting? In this paper, we address the above problem from the perspective of deep reinforcement learning of marked temporal point processes, where both the actions taken by an agent and the feedback it receives from the environment are asynchronous stochastic discrete events characterized using marked temporal point processes. In doing so, we define the agent's policy using the intensity and mark distribution of the corresponding process and then derive a flexible policy gradient method, which embeds the agent's actions and the feedback it receives into real-valued vectors using deep recurrent neural networks. Our method does not make any assumptions on the functional form of the intensity and mark distribution of the feedback and it allows for arbitrarily complex reward functions. We apply our methodology to two different applications in personalized teaching and viral marketing and, using data gathered from Duolingo and Twitter, we show that it may be able to find interventions to help learners and marketers achieve their goals more effectively than alternatives.
format	article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2073611659</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2073611659</sourcerecordid><originalsourceid>FETCH-proquest_journals_20736116593</originalsourceid><addsrcrecordid>eNqNyssKgkAUgOEhCJLyHQZaC-OMl9q06UKLAgn3MtgxxnSOnaPvn4seoNW_-L-FCLQxcbRLtF6JkLlVSuks12lqAnE4AQzyAc43SDX04Ed5A0ve-ZfERt4tveEpS-gHJNvJAt0sCsIamIE3YtnYjiH8dS22l3N5vEYD4WcCHqsWJ_LzqrTKTRbHWbo3_6kvnfI4IA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2073611659</pqid></control><display><type>article</type><title>Deep Reinforcement Learning of Marked Temporal Point Processes</title><source>Publicly Available Content (ProQuest)</source><creator>Upadhyay, Utkarsh ; De, Abir ; Gomez-Rodriguez, Manuel</creator><creatorcontrib>Upadhyay, Utkarsh ; De, Abir ; Gomez-Rodriguez, Manuel</creatorcontrib><description>In a wide variety of applications, humans interact with a complex environment by means of asynchronous stochastic discrete events in continuous time. Can we design online interventions that will help humans achieve certain goals in such asynchronous setting? In this paper, we address the above problem from the perspective of deep reinforcement learning of marked temporal point processes, where both the actions taken by an agent and the feedback it receives from the environment are asynchronous stochastic discrete events characterized using marked temporal point processes. In doing so, we define the agent's policy using the intensity and mark distribution of the corresponding process and then derive a flexible policy gradient method, which embeds the agent's actions and the feedback it receives into real-valued vectors using deep recurrent neural networks. Our method does not make any assumptions on the functional form of the intensity and mark distribution of the feedback and it allows for arbitrarily complex reward functions. We apply our methodology to two different applications in personalized teaching and viral marketing and, using data gathered from Duolingo and Twitter, we show that it may be able to find interventions to help learners and marketers achieve their goals more effectively than alternatives.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Feedback ; Marketing ; Neural networks ; Recurrent neural networks</subject><ispartof>arXiv.org, 2018-11</ispartof><rights>2018. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2073611659?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>780,784,25752,37011,44589</link.rule.ids></links><search><creatorcontrib>Upadhyay, Utkarsh</creatorcontrib><creatorcontrib>De, Abir</creatorcontrib><creatorcontrib>Gomez-Rodriguez, Manuel</creatorcontrib><title>Deep Reinforcement Learning of Marked Temporal Point Processes</title><title>arXiv.org</title><description>In a wide variety of applications, humans interact with a complex environment by means of asynchronous stochastic discrete events in continuous time. Can we design online interventions that will help humans achieve certain goals in such asynchronous setting? In this paper, we address the above problem from the perspective of deep reinforcement learning of marked temporal point processes, where both the actions taken by an agent and the feedback it receives from the environment are asynchronous stochastic discrete events characterized using marked temporal point processes. In doing so, we define the agent's policy using the intensity and mark distribution of the corresponding process and then derive a flexible policy gradient method, which embeds the agent's actions and the feedback it receives into real-valued vectors using deep recurrent neural networks. Our method does not make any assumptions on the functional form of the intensity and mark distribution of the feedback and it allows for arbitrarily complex reward functions. We apply our methodology to two different applications in personalized teaching and viral marketing and, using data gathered from Duolingo and Twitter, we show that it may be able to find interventions to help learners and marketers achieve their goals more effectively than alternatives.</description><subject>Feedback</subject><subject>Marketing</subject><subject>Neural networks</subject><subject>Recurrent neural networks</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNyssKgkAUgOEhCJLyHQZaC-OMl9q06UKLAgn3MtgxxnSOnaPvn4seoNW_-L-FCLQxcbRLtF6JkLlVSuks12lqAnE4AQzyAc43SDX04Ed5A0ve-ZfERt4tveEpS-gHJNvJAt0sCsIamIE3YtnYjiH8dS22l3N5vEYD4WcCHqsWJ_LzqrTKTRbHWbo3_6kvnfI4IA</recordid><startdate>20181106</startdate><enddate>20181106</enddate><creator>Upadhyay, Utkarsh</creator><creator>De, Abir</creator><creator>Gomez-Rodriguez, Manuel</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20181106</creationdate><title>Deep Reinforcement Learning of Marked Temporal Point Processes</title><author>Upadhyay, Utkarsh ; De, Abir ; Gomez-Rodriguez, Manuel</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_20736116593</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Feedback</topic><topic>Marketing</topic><topic>Neural networks</topic><topic>Recurrent neural networks</topic><toplevel>online_resources</toplevel><creatorcontrib>Upadhyay, Utkarsh</creatorcontrib><creatorcontrib>De, Abir</creatorcontrib><creatorcontrib>Gomez-Rodriguez, Manuel</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content (ProQuest)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Upadhyay, Utkarsh</au><au>De, Abir</au><au>Gomez-Rodriguez, Manuel</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Deep Reinforcement Learning of Marked Temporal Point Processes</atitle><jtitle>arXiv.org</jtitle><date>2018-11-06</date><risdate>2018</risdate><eissn>2331-8422</eissn><abstract>In a wide variety of applications, humans interact with a complex environment by means of asynchronous stochastic discrete events in continuous time. Can we design online interventions that will help humans achieve certain goals in such asynchronous setting? In this paper, we address the above problem from the perspective of deep reinforcement learning of marked temporal point processes, where both the actions taken by an agent and the feedback it receives from the environment are asynchronous stochastic discrete events characterized using marked temporal point processes. In doing so, we define the agent's policy using the intensity and mark distribution of the corresponding process and then derive a flexible policy gradient method, which embeds the agent's actions and the feedback it receives into real-valued vectors using deep recurrent neural networks. Our method does not make any assumptions on the functional form of the intensity and mark distribution of the feedback and it allows for arbitrarily complex reward functions. We apply our methodology to two different applications in personalized teaching and viral marketing and, using data gathered from Duolingo and Twitter, we show that it may be able to find interventions to help learners and marketers achieve their goals more effectively than alternatives.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2018-11
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2073611659
source	Publicly Available Content (ProQuest)
subjects	Feedback Marketing Neural networks Recurrent neural networks
title	Deep Reinforcement Learning of Marked Temporal Point Processes
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-09T02%3A50%3A30IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Deep%20Reinforcement%20Learning%20of%20Marked%20Temporal%20Point%20Processes&rft.jtitle=arXiv.org&rft.au=Upadhyay,%20Utkarsh&rft.date=2018-11-06&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2073611659%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_journals_20736116593%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2073611659&rft_id=info:pmid/&rfr_iscdi=true