Please enable JavaScript.
Coggle requires JavaScript to display documents.
(Q learning (Target (dogs chasing their tails. (More like supervised…
Q learning
Target
dogs chasing their tails.
More like supervised learning
Action value
Replay buffer
Prioritized Experience Replay
Epsion greedy
\[\nabla\log{\pi}\]
entropy
cross entropy
= max likelihood
DL divergence
max likelihood
機率越大 \[\log{\pi}\] 越大 , 跟reward 同向
conjugate
basis
orthonal
Q
二次式
MDP
OpLmal Value FuncLon
Value IteraLon
Convergence
IntuiLon
Contractions
Q-Values
Q-Value Iteration
Policy Evaluation
Policy Iteration
sampled base(E)
tabular
Q learning