Please enable JavaScript.
Coggle requires JavaScript to display documents.
AI (Search algorithms
~ the order of expand search nodes
~ the way use…
AI
- Search algorithms
~ the order of expand search nodes
~ the way use duplicate elimination
assume:
- Deterministic events
- Environments change only as the result of an action
- Perfect knowledge
- Single actor
State Model S(P)
- finite and discrete state space
S
- a known initial state s0 ∈ S
- a set SG ⊆ S of goal states
- actions A(s) ⊆ A applicable in each s ∈ S a deterministic
- deterministic transition function s'= f(a, s) for a ∈ A(s)
- positive action costs c(a, s)
Blind Search
Depth-first Search
Completeness: No, infinitely long unless acyclic, thus cycle check
Optimality: no, hope to get lucky :
-
-
Iterative Deepening(ID)
Completeness: Yes for acyclic path, thus cycle check
Optimality: for Uniform Cost is yes, find the shallowest goal state
-
-
-
Criteria:
- completeness: not subset solution
- optimality
- sound:correct solution
Criteria2
b-maximal branching factor(successors number) for each state
d-depth of the shallowest goal state
- Space Complexity: states
- Time Complexity: generated states
- Monte-carlo tree search: MCTS
- approximate Q-value using random simulation
- incrementally build the search tree
- finished when using up pre-defined budget: time/space[terminate whenever and give the best answer so far]
- heuristic[works well with MDPS: stochastic domains]
- select: select unexpanded node
- expand
- simulation
- backpropogation: discard the simulation and record the result for the expanded node, and then propogate back.
- ∈-greedy: exploit best action:1-∈, random action ∈
- ∈-decreasing: ∈-greedy, decrease ∈ over time
- Softmax: probability proportional
-
MCTS VS Value/policy iteration
- reachable states VS exhaustive
- lower cost VS higher
- low coverage VS higher
- online calculate when encountering not considered state VS offline one calculation for many time use
-
-
-
-
function approximation
Q-table needs large space to store for non-trival problem
- don't know any rewards for a long time at the beginning
- Classical planning width(width based planning)
-
-
-
models and simulators
simulators:
- a set of applicable actions in state s
- state transition function f(a,s)
-
- MDPs and value/policy iteration
value/policy iteration
- offline methods
- exhaustively calculating the optimal policy
- Bellman equations
The differences between value iteration and policy iteration:
- For the value iteration, the number of state is infinite. we won't know if the current value is the optimal policy, we can iterate until running out of time, and says it might be the optimal or approximate policy.
- But for policy iteration, the number of policies is finite because the state number is finite, so we know exact how may possible state we will go through until we got the optimal policy. Even though it might be huge.
-
fully observable, probabilistic
- a state space S
- initial state s0 ∈ S
- actions A(s) ⊆ A applicable in each state s ∈ S
- transition probabilities Pa(s'|s) for s ∈ S and a ∈ A(s)
- rewards r(s, a, s') positive or negative of transitioning from state s to state s'
using action a
- a discount factor 0 ≤ γ ≤ 1
- Planning models, languages, and computational approaches.
Describe arbitrary search problems
Models
- finite and discrete state space S
- a set of possible initial state S0 ∈ S
- a set SG ⊆ S of goal states
- actions A(s) ⊆ A applicable in each s ∈ S
- a non-deterministic transition function F(a, s) ⊆ S for a ∈ A(s)
- uniform action costs c(a, s)
-
-
-
Planning Languages:
STRIPS
Problem: a tuple P = <F, O, I, G>
- F: boolean facts
- O: operators/actions
- I: initial situation
- G: goal situation
Operators:
- Add: Add
- Delete: Del
- Precondition: Prec
-
- Generating heuristics, delete relaxation
Relaxation
-
-
Delete Relaxation Heuristics
- make more facts true
- can only be good( cannot unsolvable)
-
-
Approximate h+: Additive and Max Heuristics
- reduce the computational cost of delete relaxation(quick)
- approximate h+
- assumption: all sub-goal facts are achieved independently
h^max is optimistic-> optimal
h^max ≤ h+, and thus hmax ≤ h∗
因此 h^max is admissible
h^add is pessimistic
h^add ≥ h+, and thus h^add > h*(s)
因此 not admissible but more informed
better informed for the wrong reason: overcounts[cause dramatic over-estimate of h*] <- ignore positive interactions
ie: sub-plans shared between sub-goals
-
-
-
- Game theory: normal form games and extended form games
-