Loading…

Real-time dynamic programming for Markov decision processes with imprecise probabilities

Markov Decision Processes have become the standard model for probabilistic planning. However, when applied to many practical problems, the estimates of transition probabilities are inaccurate. This may be due to conflicting elicitations from experts or insufficient state transition information. The...

Full description

Saved in:

Bibliographic Details
Published in:	Artificial intelligence 2016-01, Vol.230, p.192-223
Main Authors:	Delgado, Karina V., de Barros, Leliane N., Dias, Daniel B., Sanner, Scott
Format:	Article
Language:	English
Subjects:	Algorithms Backups Dynamic programming Markov decision process Markov processes Optimization Policies Probabilistic planning Real time Robust planning Transition probabilities
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c385t-1a3ab877597062eba74d1acd7a6b0d6d1ee5ac879ca14cabe5dd110ab8d4f7de3
cites	cdi_FETCH-LOGICAL-c385t-1a3ab877597062eba74d1acd7a6b0d6d1ee5ac879ca14cabe5dd110ab8d4f7de3
container_end_page	223
container_issue
container_start_page	192
container_title	Artificial intelligence
container_volume	230
creator	Delgado, Karina V. de Barros, Leliane N. Dias, Daniel B. Sanner, Scott
description	Markov Decision Processes have become the standard model for probabilistic planning. However, when applied to many practical problems, the estimates of transition probabilities are inaccurate. This may be due to conflicting elicitations from experts or insufficient state transition information. The Markov Decision Process with Imprecise Transition Probabilities (MDP-IPs) was introduced to obtain a robust policy where there is uncertainty in the transition. Although it has been proposed a symbolic dynamic programming algorithm for MDP-IPs (called SPUDD-IP) that can solve problems up to 22 state variables, in practice, solving MDP-IP problems is time-consuming. In this paper we propose efficient algorithms for a more general class of MDP-IPs, called Stochastic Shortest Path MDP-IPs (SSP MDP-IPs) that use initial state information to solve complex problems by focusing on reachable states. The (L)RTDP-IP algorithm, a (Labeled) Real Time Dynamic Programming algorithm for SSP MDP-IPs, is proposed together with three different methods for sampling the next state. It is shown here that the convergence of (L)RTDP-IP can be obtained by using any of these three methods, although the Bellman backups for this class of problems prescribe a minimax optimization. As far as we are aware, this is the first asynchronous algorithm for SSP MDP-IPs given in terms of a general set of probability constraints that requires non-linear optimization over imprecise probabilities in the Bellman backup. Our results show up to three orders of magnitude speedup for (L)RTDP-IP when compared with the SPUDD-IP algorithm.
doi_str_mv	10.1016/j.artint.2015.09.005
format	article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1770271647</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0004370215001411</els_id><sourcerecordid>1770271647</sourcerecordid><originalsourceid>FETCH-LOGICAL-c385t-1a3ab877597062eba74d1acd7a6b0d6d1ee5ac879ca14cabe5dd110ab8d4f7de3</originalsourceid><addsrcrecordid>eNp9kM1LAzEQxYMoWKv_gYc9etk12a_sXgQpfkFFEAVvYTaZrVP3oyZppf-9WerZ0zC892Z4P8YuBU8EF-X1OgHrafBJykWR8DrhvDhiM1HJNJZ1Ko7ZjHOex5nk6Sk7c24d1qyuxYx9vCJ0saceI7MfoCcdbey4stD3NKyidrTRM9ivcRcZ1ORoHCZdo3Pooh_ynxH1GztJOAkNNNSRJ3Tn7KSFzuHF35yz9_u7t8VjvHx5eFrcLmOdVYWPBWTQVFIWteRlig3I3AjQRkLZcFMagViArmStQeQaGiyMEYKHjMlbaTCbs6vD3fD9e4vOq56cxq6DAcetU0KG0lKUuQzW_GDVdnTOYqs2lnqweyW4mkCqtTqAVBNIxWsVQIbYzSGGocaO0CqnCQeNhkJxr8xI_x_4BWLngS8</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1770271647</pqid></control><display><type>article</type><title>Real-time dynamic programming for Markov decision processes with imprecise probabilities</title><source>ScienceDirect Freedom Collection 2022-2024</source><creator>Delgado, Karina V. ; de Barros, Leliane N. ; Dias, Daniel B. ; Sanner, Scott</creator><creatorcontrib>Delgado, Karina V. ; de Barros, Leliane N. ; Dias, Daniel B. ; Sanner, Scott</creatorcontrib><description>Markov Decision Processes have become the standard model for probabilistic planning. However, when applied to many practical problems, the estimates of transition probabilities are inaccurate. This may be due to conflicting elicitations from experts or insufficient state transition information. The Markov Decision Process with Imprecise Transition Probabilities (MDP-IPs) was introduced to obtain a robust policy where there is uncertainty in the transition. Although it has been proposed a symbolic dynamic programming algorithm for MDP-IPs (called SPUDD-IP) that can solve problems up to 22 state variables, in practice, solving MDP-IP problems is time-consuming. In this paper we propose efficient algorithms for a more general class of MDP-IPs, called Stochastic Shortest Path MDP-IPs (SSP MDP-IPs) that use initial state information to solve complex problems by focusing on reachable states. The (L)RTDP-IP algorithm, a (Labeled) Real Time Dynamic Programming algorithm for SSP MDP-IPs, is proposed together with three different methods for sampling the next state. It is shown here that the convergence of (L)RTDP-IP can be obtained by using any of these three methods, although the Bellman backups for this class of problems prescribe a minimax optimization. As far as we are aware, this is the first asynchronous algorithm for SSP MDP-IPs given in terms of a general set of probability constraints that requires non-linear optimization over imprecise probabilities in the Bellman backup. Our results show up to three orders of magnitude speedup for (L)RTDP-IP when compared with the SPUDD-IP algorithm.</description><identifier>ISSN: 0004-3702</identifier><identifier>EISSN: 1872-7921</identifier><identifier>DOI: 10.1016/j.artint.2015.09.005</identifier><language>eng</language><publisher>Elsevier B.V</publisher><subject>Algorithms ; Backups ; Dynamic programming ; Markov decision process ; Markov processes ; Optimization ; Policies ; Probabilistic planning ; Real time ; Robust planning ; Transition probabilities</subject><ispartof>Artificial intelligence, 2016-01, Vol.230, p.192-223</ispartof><rights>2015 Elsevier B.V.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c385t-1a3ab877597062eba74d1acd7a6b0d6d1ee5ac879ca14cabe5dd110ab8d4f7de3</citedby><cites>FETCH-LOGICAL-c385t-1a3ab877597062eba74d1acd7a6b0d6d1ee5ac879ca14cabe5dd110ab8d4f7de3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Delgado, Karina V.</creatorcontrib><creatorcontrib>de Barros, Leliane N.</creatorcontrib><creatorcontrib>Dias, Daniel B.</creatorcontrib><creatorcontrib>Sanner, Scott</creatorcontrib><title>Real-time dynamic programming for Markov decision processes with imprecise probabilities</title><title>Artificial intelligence</title><description>Markov Decision Processes have become the standard model for probabilistic planning. However, when applied to many practical problems, the estimates of transition probabilities are inaccurate. This may be due to conflicting elicitations from experts or insufficient state transition information. The Markov Decision Process with Imprecise Transition Probabilities (MDP-IPs) was introduced to obtain a robust policy where there is uncertainty in the transition. Although it has been proposed a symbolic dynamic programming algorithm for MDP-IPs (called SPUDD-IP) that can solve problems up to 22 state variables, in practice, solving MDP-IP problems is time-consuming. In this paper we propose efficient algorithms for a more general class of MDP-IPs, called Stochastic Shortest Path MDP-IPs (SSP MDP-IPs) that use initial state information to solve complex problems by focusing on reachable states. The (L)RTDP-IP algorithm, a (Labeled) Real Time Dynamic Programming algorithm for SSP MDP-IPs, is proposed together with three different methods for sampling the next state. It is shown here that the convergence of (L)RTDP-IP can be obtained by using any of these three methods, although the Bellman backups for this class of problems prescribe a minimax optimization. As far as we are aware, this is the first asynchronous algorithm for SSP MDP-IPs given in terms of a general set of probability constraints that requires non-linear optimization over imprecise probabilities in the Bellman backup. Our results show up to three orders of magnitude speedup for (L)RTDP-IP when compared with the SPUDD-IP algorithm.</description><subject>Algorithms</subject><subject>Backups</subject><subject>Dynamic programming</subject><subject>Markov decision process</subject><subject>Markov processes</subject><subject>Optimization</subject><subject>Policies</subject><subject>Probabilistic planning</subject><subject>Real time</subject><subject>Robust planning</subject><subject>Transition probabilities</subject><issn>0004-3702</issn><issn>1872-7921</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><recordid>eNp9kM1LAzEQxYMoWKv_gYc9etk12a_sXgQpfkFFEAVvYTaZrVP3oyZppf-9WerZ0zC892Z4P8YuBU8EF-X1OgHrafBJykWR8DrhvDhiM1HJNJZ1Ko7ZjHOex5nk6Sk7c24d1qyuxYx9vCJ0saceI7MfoCcdbey4stD3NKyidrTRM9ivcRcZ1ORoHCZdo3Pooh_ynxH1GztJOAkNNNSRJ3Tn7KSFzuHF35yz9_u7t8VjvHx5eFrcLmOdVYWPBWTQVFIWteRlig3I3AjQRkLZcFMagViArmStQeQaGiyMEYKHjMlbaTCbs6vD3fD9e4vOq56cxq6DAcetU0KG0lKUuQzW_GDVdnTOYqs2lnqweyW4mkCqtTqAVBNIxWsVQIbYzSGGocaO0CqnCQeNhkJxr8xI_x_4BWLngS8</recordid><startdate>201601</startdate><enddate>201601</enddate><creator>Delgado, Karina V.</creator><creator>de Barros, Leliane N.</creator><creator>Dias, Daniel B.</creator><creator>Sanner, Scott</creator><general>Elsevier B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>201601</creationdate><title>Real-time dynamic programming for Markov decision processes with imprecise probabilities</title><author>Delgado, Karina V. ; de Barros, Leliane N. ; Dias, Daniel B. ; Sanner, Scott</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c385t-1a3ab877597062eba74d1acd7a6b0d6d1ee5ac879ca14cabe5dd110ab8d4f7de3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Algorithms</topic><topic>Backups</topic><topic>Dynamic programming</topic><topic>Markov decision process</topic><topic>Markov processes</topic><topic>Optimization</topic><topic>Policies</topic><topic>Probabilistic planning</topic><topic>Real time</topic><topic>Robust planning</topic><topic>Transition probabilities</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Delgado, Karina V.</creatorcontrib><creatorcontrib>de Barros, Leliane N.</creatorcontrib><creatorcontrib>Dias, Daniel B.</creatorcontrib><creatorcontrib>Sanner, Scott</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Artificial intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Delgado, Karina V.</au><au>de Barros, Leliane N.</au><au>Dias, Daniel B.</au><au>Sanner, Scott</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Real-time dynamic programming for Markov decision processes with imprecise probabilities</atitle><jtitle>Artificial intelligence</jtitle><date>2016-01</date><risdate>2016</risdate><volume>230</volume><spage>192</spage><epage>223</epage><pages>192-223</pages><issn>0004-3702</issn><eissn>1872-7921</eissn><abstract>Markov Decision Processes have become the standard model for probabilistic planning. However, when applied to many practical problems, the estimates of transition probabilities are inaccurate. This may be due to conflicting elicitations from experts or insufficient state transition information. The Markov Decision Process with Imprecise Transition Probabilities (MDP-IPs) was introduced to obtain a robust policy where there is uncertainty in the transition. Although it has been proposed a symbolic dynamic programming algorithm for MDP-IPs (called SPUDD-IP) that can solve problems up to 22 state variables, in practice, solving MDP-IP problems is time-consuming. In this paper we propose efficient algorithms for a more general class of MDP-IPs, called Stochastic Shortest Path MDP-IPs (SSP MDP-IPs) that use initial state information to solve complex problems by focusing on reachable states. The (L)RTDP-IP algorithm, a (Labeled) Real Time Dynamic Programming algorithm for SSP MDP-IPs, is proposed together with three different methods for sampling the next state. It is shown here that the convergence of (L)RTDP-IP can be obtained by using any of these three methods, although the Bellman backups for this class of problems prescribe a minimax optimization. As far as we are aware, this is the first asynchronous algorithm for SSP MDP-IPs given in terms of a general set of probability constraints that requires non-linear optimization over imprecise probabilities in the Bellman backup. Our results show up to three orders of magnitude speedup for (L)RTDP-IP when compared with the SPUDD-IP algorithm.</abstract><pub>Elsevier B.V</pub><doi>10.1016/j.artint.2015.09.005</doi><tpages>32</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0004-3702
ispartof	Artificial intelligence, 2016-01, Vol.230, p.192-223
issn	0004-3702 1872-7921
language	eng
recordid	cdi_proquest_miscellaneous_1770271647
source	ScienceDirect Freedom Collection 2022-2024
subjects	Algorithms Backups Dynamic programming Markov decision process Markov processes Optimization Policies Probabilistic planning Real time Robust planning Transition probabilities
title	Real-time dynamic programming for Markov decision processes with imprecise probabilities
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T01%3A04%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Real-time%20dynamic%20programming%20for%20Markov%20decision%20processes%20with%20imprecise%20probabilities&rft.jtitle=Artificial%20intelligence&rft.au=Delgado,%20Karina%20V.&rft.date=2016-01&rft.volume=230&rft.spage=192&rft.epage=223&rft.pages=192-223&rft.issn=0004-3702&rft.eissn=1872-7921&rft_id=info:doi/10.1016/j.artint.2015.09.005&rft_dat=%3Cproquest_cross%3E1770271647%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c385t-1a3ab877597062eba74d1acd7a6b0d6d1ee5ac879ca14cabe5dd110ab8d4f7de3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1770271647&rft_id=info:pmid/&rfr_iscdi=true