Loading…

Constrained Markov Decision Models with Weighted Discounted Rewards

This paper deals with constrained optimization of Markov Decision Processes. Both objective function and constraints are sums of standard discounted rewards, but each with a different discount factor Such models arise, e.g., in production and in applications involving multiple time scales. We prove...

Full description

Saved in:

Bibliographic Details
Published in:	Mathematics of operations research 1995-05, Vol.20 (2), p.302-320
Main Authors:	Feinberg, Eugene A, Shwartz, Adam
Format:	Article
Language:	English
Subjects:	additional constraints Applied sciences Constrained optimization Decision making models Decision theory. Utility theory Dynamic programming Exact sciences and technology Geometric funnels Integers Markov decision processes Markov models Markov processes Operational research and scientific management Operational research. Management science Optimal policy Pareto efficiency Random allocation several discount factors
Citations:	Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This paper deals with constrained optimization of Markov Decision Processes. Both objective function and constraints are sums of standard discounted rewards, but each with a different discount factor Such models arise, e.g., in production and in applications involving multiple time scales. We prove that it a feasible policy exists, then there exists an optimal policy which is (i) stationary (nonrandomized) from some step onward, (ii) randomized, Markov before this step, but the total number of actions which are added by randomization is bounded by the number of constraints. Optimality of such policies for multi-criteria problems is also established. These new policies have the pleasing aesthetic property that the amount of randomization they require over any trajectory is restricted by the number of constraints. This result is new even for constrained optimization with a single discount factor, where the optimality of randomized stationary policies is known. However, a randomized stationary policy may require an infinite number of randomizations over time. We also formulate a linear programming algorithm for approximate solutions of con-strained weighted discounted models.
ISSN:	0364-765X 1526-5471
DOI:	10.1287/moor.20.2.302