Please enable JavaScript.
Coggle requires JavaScript to display documents.
series 1 - Coggle Diagram
series 1
episode 1
First state S(0,0)
Agent starts at location of 0,0 on the grid
city_grid
city_grid = np.full((GRID_SIZE, GRID_SIZE), None)
Agent is placed at the left top corner (0,0)
-
Compatibility Matrix
It is use tu calculate the compatibility between building, so the agent knows which buildings are better together
-
alpha = 0.8
Represents the learning rate, determines the extent to which new information overrides old information
gamma = 0.95
Represents the discount factor, controls the importance of future rewards ( Makes the agent more patient or impatient).
-
Possible acctions
-
It is able to choose which building to place on the grid (where it is at (0,0) )
At this point the building is random because it does not have previous knowledge. (Q table is empty )
S(0,x )
All the values of the variables are the exact same for this stat with the exception of the city_grid, which contains the previous state building that was placed
Possible acctions
-
It is able to choose which building to place on the grid (where it is at (0,x) )
At this point the building is random because it does not have previous knowledge. (Q table is empty )
At the end of the state, the agent moves to the right to the next cell until there is no cell to the right
-
-
S(x,y)
-
-
The agent repites the previous state and then goes down 1. It repeat the process until it ends on the lower right corner of the grid
-
-
-
-
Episode x
The difference between is episode is that the value of the episode si going to decrease every time, so the agent can begin exploding instead of only exploring
-
X
Represent the episode number of the state, at the end of an episode the x increases
-
-
-
-
An episode si a repetition of all the states for a particular part of the grid. We could say it is a repetition of all the states 1 time
-
After the reward is calculated the q table is filed based on the rewards that the agent got. ( this is how the agent learns)
-
-
A slamler grid is chosen
the grid is a based on the actual grid only that is starts small on so, each series the grid will increase adding a different section.
The epsilon in each of the sections will be deferent depending on the series, if it is the first series the section is, the epsilon will be 1 and the more series the area is , the lower the epsilon
-
-