No Access

PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES AND PERIODIC POLICIES WITH APPLICATIONS

Department of Statistics and Insurance Science, University of Piraeus, 80 Karaoli and Dimitriou Street, Piraeus, 18534, Greece

Search for more papers by this author

and

D. STENGOS

Department of Statistics and Insurance Science, University of Piraeus, 80 Karaoli and Dimitriou Street, Piraeus, 18534, Greece

Search for more papers by this author

https://doi.org/10.1142/S0219622011004762Cited by:1 (Source: Crossref)

Abstract

This paper treats the infinite horizon discounted cost control problem for partially observable Markov decision processes. Sondik studied the class of finitely transient policies and showed that their value functions over an infinite time horizon are piecewise linear (p.w.l) and can be computed exactly by solving a system of linear equations. However, the condition for finite transience is stronger than is needed to ensure p.w.l. value functions. In this paper, we introduce alternatively the class of periodic policies whose value functions turn out to be also p.w.l. Moreover, we examine a more general condition than finite transience and periodicity that ensures p.w.l. value functions. We implement these ideas in a replacement problem under Markovian deterioration, investigate for periodic policies and give numerical examples.

Keywords:

References

J. E. Goulionis, Journal of Statistics and Management Systems 10(5), 715 (2007). Crossref, Google Scholar
J. E. Goulionis and V. K. Benos, Advances and Applications in Statistics 7(3), 357 (2007). Google Scholar
S. Alonsoet al., International Journal of Information Technology and Decision Making 8(2), 313 (2009). Link, Web of Science, Google Scholar
M. Spaan and N. Vlassis, Journal of Artificial Intelligence Research 24, 195 (2005). Crossref, Web of Science, Google Scholar
G. L. Kyriakopoulos, K. G. Kolovos and M. S. Chalikias, Communications in Computer and Information Science 112(2), 19 (2010), DOI: 10.1007/978-3-642-16324-1_3. Crossref, Google Scholar
H. H. Lee, R. R. Chen and C. F. Lee, International Journal of Information Technology and Decision Making 8(4), 629 (2009). Link, Web of Science, Google Scholar
F. J. Cabrerizo, S. Alonso and E. Herrera-Viedma, International Journal of Information Technology and Decision Making 8(1), 109 (2009). Link, Web of Science, Google Scholar
D. L. Mclain and R. J. Aldag, International Journal of Information Technology and Decision Making 8(3), 407 (2009). Link, Web of Science, Google Scholar
J. Ni and D. Khazanchi, International Journal of Information Technology and Decision Making 8(1), 55 (2009). Link, Web of Science, Google Scholar
Y. Penget al., International Journal of Information Technology and Decision Making 7(4), 639 (2008). Link, Web of Science, Google Scholar
G. H. Papadimitriou and J. N. Tsitsiklis, Mathematics of Operations Research 12, 441 (1987), DOI: 10.1287/moor.12.3.441. Crossref, Web of Science, Google Scholar
N. L. Zhang, T. Chen and T. Kocka, International Journal of Approximate Resouring 38(3), 311 (2005). Web of Science, Google Scholar
M. Ohnishi, H. Kawai and H. Mine, European Journal of Operational Research 27(1), 117 (1986), DOI: 10.1016/S0377-2217(86)80014-5. Crossref, Web of Science, Google Scholar
N. L. Zhang and W. Liu, Journal of Artificial Intelligence Research 7, 199 (1997). Crossref, Web of Science, Google Scholar
N. L. Zhang, International Journal of Information Technology and Decision Making 1(1), 91 (2002). Link, Google Scholar
W. Kim, J. S. Hong and Y. U. Song, International Journal of Information Technology and Decision Making 6, 61 (2007). Link, Web of Science, Google Scholar
E. Sondik, Operations Research 26, 282 (1978), DOI: 10.1287/opre.26.2.282. Crossref, Web of Science, Google Scholar
D. Blackwell, The Annals of Mathematical Statistics 36, 226 (1965), DOI: 10.1214/aoms/1177700285. Crossref, Google Scholar
E. G. Monahan, Management Science 28(1), 1 (1982), DOI: 10.1287/mnsc.28.1.1. Crossref, Web of Science, Google Scholar
P. L. Bartlett and J. Baxter, Stochastic optimization of controlled partially observable Markov decision processes, Proceedings of the IEEE Conference on Decision and Control1 (2000) pp. 124–129. Google Scholar
D. Bertsekas , Stochastic Optimal Control. The Discrete-Time Case ( Academic Press , NY , 1998 ) . Google Scholar
J. Aström, Journal of Mathematical Analysis and Applications 10, 174 (1965). Crossref, Web of Science, Google Scholar
E. V. Denardo, Siam Review 9, 165 (1967), DOI: 10.1137/1009030. Crossref, Web of Science, Google Scholar
N. Ahmad and P. A. Laplante, International Journal of Information Technology and Decision Making 8(1), 151 (2009). Link, Web of Science, Google Scholar
M. Betteret al., International Journal of Information Technology and Decision Making 8(1), 571 (2009). Google Scholar
D. F. Li, International Journal of Information Technology and Decision Making 8(2), 289 (2009). Link, Web of Science, Google Scholar
A. Tewari and P. L. Bartlett, Bounded parameter Markov decision processes with average reward criterion, Proceedings of the Conference on Learning Theory (2007) pp. 263–277. Google Scholar
H. H. Lee, R. R. Chen and C. F. Lee, International Journal of Information Technology and Decision Making 8(4), 629 (2009). Link, Web of Science, Google Scholar
M. C. Chang, J. L. Hu and G. H. Tzeng, International Journal of Information Technology and Decision Making 8(3), 609 (2009). Link, Web of Science, Google Scholar
J. Pineau, G. Gordon and S. Thrun, Point-based value iteration: An anytime algorithm for POMDPs, International Joint Conference on Artificial Intelligence (Acapulco, Mexico, 2003) pp. 1025–1032. Google Scholar
L. Zhao and Y. Jiang, International Journal of Information Technology and Decision Making 8(4), 769 (2009). Link, Web of Science, Google Scholar
J. Choi and K.-E. Kim , Inverse reinforcement learning in partially observable environments , Proceedings of the International Joint Conference on Artificial Intelligence ( 2009 ) . Google Scholar