Tag: Markov Decision Process

[강화학습] Dynamic Programming

Apr 10, 2024

—

by

amoogae

in 강화학습, 지식

동적 프로그래밍은 MDP 모델을 활용하여 최적 해결책을 도출하는 프로그래밍 기술이다. 벨만 방정식을 이용하여 정책 평가 및 향상, 정책 반복, 가치 반복 등을 수행하여 최적의 정책을 찾는다.
[강화학습] Markov Decision Process

Mar 31, 2024

—

by

amoogae

in 강화학습, 지식

This post introduces the important concept of Markov Decision Process (MDP) in reinforcement learning, discussing its basic structure and components, exploring the Bellman equation used to find the optimal policy. It covers Markov Chain, Markov Reward Process, optimal values, and policies. The next post will address Partially Observable Markov Decision Process.