Please enable JavaScript.
Coggle requires JavaScript to display documents.
Sample INefficiency in deep RL - Coggle Diagram
Sample INefficiency
in deep RL
small
learning rates
why are these small?
small changes in param space
can cause huge (unwanted)
changes in action distribution
how to overcome this?
Trust Region Updates
choose max learning rate for update
without changing policy too much
Trust Regions
limit change in policy
continuous stable\nbut slow improvements
per update!
Trust Region\nDependencies?
Poor
Policy Initialization
far from optimum
Stochastic Policy
gradient is an expectation
w.r.t states and actions
Deterministic Policy Gradient
is only an expectation w.r.t. states
needs exploration
optimistic Q(s,a) initialization
Noisy states/actions
exploration
stochastic policy
noisy nets
optimistic value init
hyperparameters
high
dimensionality
state space
action space
high number of NN parameters
throwing away
experiences
what could they be used for?
improving policy
learning a model
absence
of a model
models are inprecise
use model only when it is precise
partially precise
for certain (s,a,s',r) pairs
why are models inprecise?
maximization of
return expectation
expectation might not
be the optimum of distribution
considered time horizon
is too long -> n-step return