Please enable JavaScript.
Coggle requires JavaScript to display documents.
Intro to RL (Sutton) (tabular solution methods (Chap3 - finite MDPs (types…
Intro to RL (Sutton)
tabular solution methods
-
Chap3 - finite MDPs
MAB estimates q(a), MDP estimates q(a,s) and v*(s)
3 signals: actions (vector), states (vector), rewards (single numbers)
-
-
-
-
-
Chap6 - TD learning
-
one-step, tabular, model-free TD methods
-
off-policy (1 greedy policy, 1 epsilon-greedy policy)
-
-
-
-
-
-
-
-
Overview
all RL methods can be viewed as 2 interacting processes revolving around an approximate policy (policy improvement) & an approximate value function (policy evaluation)
-
model-free, w/o bootstrap
-
-
RL methods
DP - model-based, w boostrap
Monte carlo methods - model-free, w/o boostrap
TD learning - model-free, w boostrap
one-step, tabular, model-free TD
-
-
variations/extensions
-
-
model-based - planning, link to DP