Tag: 마르코프 모델
-
[강화학습] Markov Decision Process
This post introduces the important concept of Markov Decision Process (MDP) in reinforcement learning, discussing its basic structure and components, exploring the Bellman equation used to find the optimal policy. It covers Markov Chain, Markov Reward Process, optimal values, and policies. The next post will address Partially Observable Markov Decision Process.
