Please enable JavaScript.
Coggle requires JavaScript to display documents.
ML-5 - Coggle Diagram
ML-5
reinformcement learning
parts
agent
enviroment
actions
rewards
reward shaping
subgoals
state action reward loop
goal
getiing the best reward
policy
this ddesides the action based on the q table
greedy
random
examples
alpha go zero
stock trading
autonomous robot navigation
snake
the math
discount
ggamma
a small gamma (the near future is much important than the far future
a large gamma also the far future is important
q learning
q value
value denotes the expected future reward of that action at that state
qtable
bellman equeation
not feasible for big spaces
markov property
the belmman equation
the agent does not need to save past states to be able to predict the future
deep q learnng
input state( 4 frames or something)
output qvalues for every action state
frame stacking
replay buffer
saving and replaying past experiences
double deep q leaning
neuroevolution
the theory of evolution
survival of the fittest
sexual reproduction
cross-over
elitism
we use neuroevolution
because of the swiss cheese
local minimums
no labeld training data
RL technologies
libraries
stable basline 3
TF agents
tensorforce
kerasRL
OpenAi Basline
frameworks
OpenAI gym
Google dopamine
RLLib
Keras-RL
optuna
a hyperpatameter optimization framework
proximal policy optimization