Please enable JavaScript.
Coggle requires JavaScript to display documents.
Reinforcement Learning, MC Prediction for state-value function, Practice,…
Reinforcement Learning
DQN
CNN
Experience Replay
Off-Policy
Bellman Optimality Eqn
Bellman Eqn
Separate Target Network
Learning Stability
Temporal-difference method (Bootstrapping)
Tabular Q-Learning
Practice
Gym Practice
Cart Pole DQN
Policy Gradient Methods
REINFORCE
Monte-Carlo learning
Policy Gradient Theorem
REINFORCE with baselinse (A2C)
Adding baseline
TDAC
Value-based methods (Basic)
Dynamic Programming
Policy Iteration
Policy Evaluation
Policy Improvement
Value Iteration
Bootstrapping
biased
Generalised Policy Iteration (GPI)
Monte Carlo Methods
MC Control
On-policy MC Control (for e-soft)
MC Control ES
MC Prediction
Multi-Armed Bandits
Contextual Bandits
learning from sample average
unbiased
MC Prediction for state-value function
First-visit MC
Every-visit MC
Practice
Exploitation vs Exploration
Huber Loss
MC Estimation for action-value function
exploring starts
e-greedy