Please enable JavaScript.
Coggle requires JavaScript to display documents.
강화학습의 MDP (MDP Key Players! (Agent (Value Function (state-value/action…
강화학습의 MDP
MDP Key Players!
States
Actions
Reward function
Policy
Agent
Value Function
state-value/action-value
Transition Model
planning
Environment
what are the Solutions?
Model based method - Planning
Exhaustive Search
Dynamic Programming
Bellman Equation
Value Iteration - Bellman Expectation Equation
Evaluation + Improvement at the same time
Policy Iteration - Bellman Optimality Equation
Evaluation and then Improvement
Model free method - Learning
Monte Carlo Method
Temporal Difference Learning
Off Policy - Q Learning
On Policy - SARSA
MDP Object: find policy
for expected sum of future reward
expected sum of future reward?!.. how??
value function + Bellman equation
state-value function
v()
action-value function
Q(), Q value, Q function, Q learning
Bellman equation
Bellman Expectation Equation
Bellman Optimal Equation
Marcov Decision Process란?
여기서 모델은
MDP에서 전이 모델(transition model)
ml