Please enable JavaScript.
Coggle requires JavaScript to display documents.
Reinforcement learning (action selection strategies for exploration…
Reinforcement learning
find optimal policy
value-based methods
input: state & action
output: reward
Q-learning
deep Q-network
policy based methods
input: state
output: action
gradient based methods
vanilla policy gradient
natural policy gradient
derivative-free methods
cross-entropy
evolutionary-based methods
actor-critic methods
catastrophic forgetting
sweep rehearsal (1993)
Srivastava (2013)
Ian Goodfellow (2013/5)
Prioritized experience replay (2016)
Deep generative replay (2017)
architecture
multi-armed bandit
contextual multi-armed bandits (one-state RL)
full RL (MDP)
action selection strategies for exploration
greedy approach
random approach
epsilon-greedy approach
Boltzmann approach
Bayesian approach (w Dropout)
intrinsic motivation