Please enable JavaScript.

Coggle requires JavaScript to display documents.

Deep Reinforcement Learning, Notes - Coggle Diagram

- - - - Gamma is discount factor (0.95 and 0.99)
        
        larger gamma -> smaller discount for long term reward (long term reward preferred)
        
        smaller gamma -> bigger discount for long term reward (short term favored)
  - - - Policy based Method
        
        Deterministic:
        
        Policy at given state will always return same action
        
        Stochastic:
        
        Policy based: state -> best possible action
      - Value Based Method: state -> expected value
        
        Value of state: expected discount return agent gets if it starts in that state
        Act according to policy -> go to state with highest value
        
        In value based method, you train agent to recognize which state is more valuable.