Please enable JavaScript.
Coggle requires JavaScript to display documents.
Deep Reinforcement Learning, Notes - Coggle Diagram
Deep Reinforcement Learning
Introduction
Observation/State Space
State: complete description of the world
Observation: partial description of the world
Action Space
Discrete: Number of possible actions is finite
Continuous: set of possible actions is infinite
Rewards
Due to reward, agent knows if taken action is good or not
Gamma is discount factor (0.95 and 0.99)
larger gamma -> smaller discount for long term reward (long term reward preferred)
smaller gamma -> bigger discount for long term reward (short term favored)
Task types
Episodic Task: Starting point and ending point is defined Ex: Super Mario, Episodic Task starts with new level and ends when either level is cleared or Mario is killed
Continuous Task: No terminal state, task is to be performed unless stopped by the user. Ex: automated stock trading
Exploitation/Exploration Tradeoff
Exploration: Trying random actions to get more information about environment (higher potential rewards)
Exploitation: using already known information to maximize the reward
Solving RL Problems
Policy based Method
Deterministic:
Policy at given state will always return same action
Stochastic:
Policy based: state -> best possible action
Value Based Method: state -> expected value
Value of state: expected discount return agent gets if it starts in that state
Act according to policy -> go to state with highest value
In value based method, you train agent to recognize which state is more valuable.
Notes
Reward Hypothesis: goal can be defined as maximization of cumulative reward