RANDOMIZED AND PAST-DEPENDENT POLICIES FOR MARKOV DECISION PROCESSES WITH MULTIPLE CONSTRAINTS
The Markov decision problem of locating a policy to maximize the long-run average reward subject to K long-run average cost constraints is considered. It is assumed that the state and action spaces are finite and the law of motion is unichain, that is, every pure policy gives rise to a Markov chain with one recurrent class. It is first proved that there exists an optimal stationary policy with a degree of randomization no greater than K; consequently, it is never necessary to randomize in more than K states. A linear program produces the optimal policy with limited randomization. For the special case of a single constraint, we also address the problem of finding optimal nonrandomized, but nonstationary, policies. We show that a round-robin type policy is optimal, and conjecture the same for a steering policy that depends on the entire past history of the process, but whose implementation requires essentially no more storage than that of a pure policy. Numerous applications can be adapted to the constrained Markov Decision Process, including highway management and flow control in a data network.
-
Corporate Authors:
Operations Research Society of America
Mount Royal and Guilford Avenue
Baltimore, MD United States 21202 -
Authors:
- Ross, K W
- Publication Date: 1989-5
Media Info
- Features: References;
- Pagination: p. 474-477
-
Serial:
- OPERATIONS RESEARCH
- Volume: 37
- Issue Number: 3
- Publisher: NEWELL, GORDON F.
Subject/Index Terms
- TRT Terms: Decision making; Highway maintenance; Maintenance management; Markov processes; Pavement management systems; Policy
- Subject Areas: Administration and Management; Highways; Maintenance and Preservation; Pavements; Policy; I10: Economics and Administration;
Filing Info
- Accession Number: 00485678
- Record Type: Publication
- Files: TRIS
- Created Date: Jul 31 1989 12:00AM