Loading…

Simulation-based optimization of Markov reward processes

This paper proposes a simulation-based algorithm for optimizing the average reward in a finite-state Markov reward process that depends on a set of parameters. As a special case, the method applies to Markov decision processes where optimization takes place within a parametrized set of policies. The...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on automatic control 2001-02, Vol.46 (2), p.191-209
Main Authors: Marbach, P., Tsitsiklis, J.N.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper proposes a simulation-based algorithm for optimizing the average reward in a finite-state Markov reward process that depends on a set of parameters. As a special case, the method applies to Markov decision processes where optimization takes place within a parametrized set of policies. The algorithm relies on the regenerative structure of finite-state Markov processes, involves the simulation of a single sample path, and can be implemented online. A convergence result (with probability 1) is provided.
ISSN:0018-9286
1558-2523
DOI:10.1109/9.905687