Please enable JavaScript.

Coggle requires JavaScript to display documents.

AI (image (Informed Search New Idea: Heuristic (image Consistent \(…

- - - - applications
      - Complexity of A*
        
        complete
        
        optimal
        
        optimally efficient,
        A* is optimally efficient for any given admissible heuristic function. This is because any algorithm that does not expand all nodes in the contours between the root and the goal runs the risk of missing the optimal solution.
        The catch with A* is that even though its complete, optimal and optimally efficient, it still can't always be used, because for most problems, the number of nodes within the goal contour search space is still exponential in the length of the solution.
        Similarly to breadth-first search, however, the major difficulty with A* is the amount of space that it uses.
        
        IDA*
        http://www.massey.ac.nz/~mjjohnso/notes/59302/l04.html
    - - How to obtain? It must be easier to obtain than real cost
        
        Consider solution to relaxed problem
      - Domiance max of two admissable heuristics is admissable and dominates both heuristics
    - - The worst-case complexity for greedy search is O(bm), where m is the maximum depth of the search. Its space complexity is the same as its time complexity, but the worst case can be substantially reduced with a good heuristic function.
  - - - Fringe Strategies
        
        Iterative Deepening limited Search
        
        DFS up to Depth k iterativly increase k
        
        good: saves storage space
        
        small Problem: recomputes graph, but not so bad because we have exponential growth in depth
        
        Uniform cost Search
        
        strategy expand least cummulative expensive node in fringe
        
        runtime \( O(b^{d+1}) \)
        
        space \( O(b^{d+1} )\)
        
        Optimal ( finds goal with least cost)
        
        Breadth First Search
        
        strategy expand shallowest node in fringe ( implement as FIFO)
        
        runtime \( O(b^{d+1}) \)
        
        space \( O(b^{d+1} )\)
        
        Optimal for constant cost per step else no
        
        complete yes ( d must be finite)
        
        Depth First Search
        
        strategy expand deepest node in fringe ( implement as LIFO fringe)
        
        runtime complexity \( O(b^m) \)
        
        space complexity \( O(b m) \) (good)
        
        completeness no if \( m = \infty \), or if loop ( only in tree finite)
        
        optimal no (finds left most solution)
      - fringe set of unexpanded leaf nodes
        
        important check goal after node has been chosen
  - - - Map Coloring
      - N-Queens
    - - Naive
        
        behaves like DFS
        
        Important -> Dont detect problem unil constraint check is triggered by assigning variable conflicting with adjacent variable ( in Graph)
        
        Ordering
        
        "Remaining" with respect to some filtering algorithm
        
        Concept Arc consistency
        
        extreme case almost never happens
        
        complexity like BFS
  - - - Alpha-beta
        
        keep track of current max (alpha ) value and min ( beta) value along currect path
        
        if current min node below alpha return current value and exit loop
        
        if current max node below beta return current value and exit loop
        just like in CSP's and in Graph search we can avoid exploring unnecessary parts of tree
    - - Same as Minimax Replace Min node with Expectation
        
        Probabilites are known
        
        alpha beta pruning no longer possible
        
        Caution Expectimax only calculates action at root :question:
- - - - Policy Evaluation( Goal: evaluate \( V^\pi \)):
        
        act according to some policy \( \pi \)
        
        after visting s transitioning into s' and recieving reward r
        \( V^\pi(s) \leftarrow V^\pi(s) + \alpha ([r + \gamma V^\pi(s'))] - V^\pi(s) ) \)
      - Q learning (Goal learn optimal \( \pi^* \))
        
        act according to some policy \( \pi \)
        
        state action pair (s,a) has been visited with transition into state s' and reward r :
        \( Q(s,a) \leftarrow Q(s,a) + \alpha ([r + \gamma \max_{a'} Q(s',a')] - Q(s,a)) \)
        
        if \( \pi \) fullfills mild conditions ( every state is visited suffiencently often) \( Q \rightarrow Q^* \)
        
        extract optimal policy with policy extraction
      - Ideas:
        Bellman Equations as running average updates
    - - Epsilon Greedy:
        
        Exploit i.e. act optimally (to current knowledge) with probability (1- epsilon)
        
        Explore with probability epsilon
      - Explore the unexplored :red_cross:
        
        \( \tilde(V)(s) = V(s) + g(s, n_s)\)
        
        g(s, n_s) is some function that decreas with \( n_s \)
        
        \( n_s\) counts visites to s
    - - Bellman Equation
        
        Q value iteration
        
        Problem: Doesnt scale ! have to compute q(s,a) for every stte action pair
        
        Solution
        Aproximate Q leanring
        
        question what do we average over
        
        Experience Replay
        
        REINFORCE
        
        2 more items...
    - - Concepts
        
        Definitions
        
        Typically solve prediction to solve control
        
        Environments
        
        Agent
        
        Taxonomy
        
        Taxonomy 2
      - Three fundamental problems in sequential decision making
      - MDP
        
        MRP Defintions
        
        Solutions for known MDP
        
        Policy Evaluation
        
        Policy Iteration
        
        Modified Policy Iteration
        
        Value Iteration
        
        Solution -> sampling!
        
        prioritised sweeping (similar to arc consistency in AC3)
        
        Bellman EQ.
        
        Expectation
        
        Optimality
        
        Solutions for unknown MDP
        
        Policy Evaluation
        
        Monte Carlo Learning
        
        Monte Carlo
        
        TD Backup
        
        Control
        
        1 more item...
        
        off policy
        
        off policy mc bad ( low importance sampling wheight)
        
        1 more item...
      - Scaling