Please enable JavaScript.

Coggle requires JavaScript to display documents.

Intelligent Systems - Coggle Diagram

- - - - Speed
      - Memory
      - 'Optimality'
    - - Single-state
        
        Deterministic, fully observable
        Agent knows exactly which state it will be in; solution is a sequence
        
        Problem formulation
        
        Initial state
        
        Actions / Successor function
        
        Goal Test
        
        Explicit
        e.g. "at Bucharest"
        
        Implicit
        e.g. checkmate
        
        Path cost (additive)
        e.g. sum of distances
        
        Solution Formulation
        
        A sequence of actions leading from the initial sate to a goal state
      - Sensorless
        
        Non-observable
        Agent may have no idea where it is; solution is a sequence
      - Contingency
        
        Nondeterministic and/or partially observable
        percepts provide new information about current state
        Often involve search, execution
      - Exploration
        
        Unknown state space
    - - Basic Idea
        Offline simulated exploration of state space by generating successors of already-explored states (aka expanding states)
      - Measures of space and time complexity
        
        b
        Maximum branching factor of the search tree
        
        d
        Depth of the least-cost solution
        
        m
        Maximum depth of the state space
      - Uninformed Strategies
        
        Breadth-first search
        
        Implementation
        Expand shallowest node
        fringe is a FIFO queue
        
        Properties
        Complete = Yes (if b is finite)
        Time = 1 + b + b^2 + b^3 + ... + b^d + b(b^d -1) = O(b^(d+1))
        Space = O(b^(d+1)) (keeps every node in memory)
        Optimal = Yes (if cost = 1 per step)
        
        Depth-first search
        
        Implementation
        Expand deepest unexpanded node
        fringe is a LIFO queue
        
        Properties
        Complete = No (fails in infinite depth spaces and spaces with loops, can be modified to avoid repeated states along path which makes it complete in finite space)
        Time = O(b^m), terrible if m is much larger than d, can be much faster than breadth-first if solutions are dense)
        Space = O(bm), linear!
        Optimal = No
        
        Depth-limited search
        
        Implementation
        Deep-first search with depth limit n
        Nodes at depth n have no successors
        
        Properties
        Complete = No (Solution maybe be deeper than n)
        Time = ?
        
        Iterative deepening search
        
        Implementation
        Runs multiple depth-limited searches with increasing depth until it find result
        
        Properties
        Complete = Yes
        Time = O(b^d)
        Space = O(bd)
        Optimal = Yes (If step cost = 1)
      - Heuristic Searches
        Impetus: Naive search space is usually much, much bigger that you expect
        
        Best First
        Use an evaluation function f(n) to estimate the desirability of each node and expand the most desirable un-expanded node.
        
        A*
        
        Implementation
        Evaluation function is a combination of heuristic h(n) and cost to reach node n so far, g(n). i.e. f(n) = g(n) + h(n)
        
        Theorem
        If h(n) is admissible, A* using TREE-SEARCH is optimal.
        This is not the case for GRAPH-SEARCH
        
        Theorem
        If h(n) is consistent, A* using GRAPH-SEARCH is optimal
        
        Characteristics
        A* expands all nodes such that f(n) < C*
        Possibly expands some nodes on the right of G where f(n) = C*
        Never expands nodes where f(n) > C*
        
        No other algorithm is guaranteed to expand less nodes than A* unless there are many tie-breaking nodes
        
        Properties
        Complete = Yes (unless infinitely many nodes with f <= f(G))
        Time = Exponential in length of optimal solution
        Space = Keeps all expanded nodes in memory
        Optimal = Yes
        
        Greedy Best-First
        
        Implementation
        Best First search that expands purely based on heuristic, selecting the option that appear to be the best
        
        Properties
        Complete = No (Can get stuck in loops like depth-first is heuristic is bad)
        Time = O(b^m), but a good heuristic can give dramatic improvements
        Space = O(b^m), keeps all nodes in memory
        Optimal = No
        
        Admissible Heuristic
        A heuristic is admissible if for every node n, h(n) <= h(n) where h(n) is the true cost of reaching the goal state from n
        
        An admissible heuristic never over-estimate the cost to reach the goal.
        
        Example
        h(sld)(n), the straight line distance of node n to goal node, never over-estimates the actual road distance
        
        Consistent Heuristic (or monotonic)
        A heuristic is admissible if h(n) <= c(n,a,n') + h(n')
        i.e. f(n) is non-decreasing along any path
        
        Proof
        If h is consistent, we have:
        f(n') = g(n') + h(n')
        f(n') = g(n) + c(n,a,n') + h(n')
        f(n') >= g(n) + h(n)
        f(n') >= f(n)
        
        Dominating Heuristic
        If both h1(n) and h2(n) are both admissible and h1(n) >h2(n) then h1 dominates h2
        The dominating heuristic the better one for searching.
        
        Effective Branching Factor
        The average number of nodes expanded
        
        Relaxed Problems
        A problem with fewer restrictions on it than the original problem
        
        The cost of an optimal solution to a relaxed problem is an admissible heuristic for the original problem
      - Depth-First Search vs Backtracking Search
        
        Exact same order of traversal
        
        Different internal structure
        
        Depth first search adds all the successors to a queue
        
        Back tracking iterates through the sucessors, recursing on each and thereby not requiring a data structure
    - - State
        A representation of a physical configuration
      - Node
        A data structure constituting part of a search tree includes state, parent node, action, path cost g(x), depth
    - - Completeness
        Does is always find a solution if one exists?
      - Time Complexity
        Number of nodes generated
      - Space Complexity
        Maximum number of nodes in memory
      - Optimality
        Does is always find a least-cost solution?
    - - Basic Search
      - Bidirectional Search
        Do two searches, one starting at the initial and one starting at the goal state. b^(d/2) + b^(d/2) is much less than b^d.
        At each step, check if a node is in the fringe of the other
        
        Assumes Pred(x) is easy to compute
        Easy if Pred(x) = Succ(x)
        
        If many goal states:
        Add a dummy node which has the true goal states as predecessors
  - - - Map colouring
        
        Variables: {WA, NT, Q, NSW, V, SA, T}
        (Regions of australia)
        
        Domains: {Red, Blue, Green}
        
        Constraints: Adjacent regions must have different colours
      - N-Queens
      - Real-World
        
        Assignment Problems
        e.g. who teaches which class
        
        Timetabling problems
        e.g. which class is offered when and where
        
        Transportation Scheduling
        
        Factory scheduling
    - - Discrete Variables
        
        Finite Domains
        
        n variables
        
        domain size d means o(d^n) complete assignments
        
        Examples:
        Boolean CSPs, Boolean Sat
        
        Infinite Domains
        
        integers, strings, etc..
        
        Examples:
        Job Scheduling
      - Continuous Variables
        
        Linear constraints solvable in polynomial time by linear programming
        
        Examples:
        Start/end times for hubble space telescope observations
    - - Unary
        Constraints involve a single variable
        e.g. SA /= green
      - Binary
        Constraints involves pairs of variables
        SA /= WA
      - Higher-order
        Constraints involve 3 or more variables
        e.g. cryptarithmetic column constraints
    - - Standard
        
        Initial State
        The empty assignment
        
        Successor function
        Assign a value to an unassigned variable that does not conflict with current assignment.
        Fail if no legal assignments
        
        Goal test
        The current assignment is completed
        
        Properties
        
        Solutions appear at depth n with n variables
        -> Use depth-first search
        
        Path is irrelevant
        
        Backtracking search is the basic uninformed algorithm for CSPs
      - Heuristics Based
        
        Most constrained variable
        Choose the variable with the fewest legal values
        As these are most likely to prune the search tree
        
        Most constraining variable
        The variable with the most constraints on remaining variables
        Most constrained is more important but most constraining is a useful tie-breaker
        
        Least constraining value
        Given a variable, choose the least constraining value.
        That's the one that rules out the fewest values in the neighbouring variables to leave other variables as open as possible.
        Requires forward checking.
        
        Forward checking
        Keep track of remaining legal values for unassigned variables
        Terminate search when any variable has no legal values
      - Local Searches
        
        Examples
        Hill-climbing, simulated annealing
        
        How to apply to CSPs
        allow states with constraint violaions
        operators reassign variable values to try to minimise/eliminate conflicts
        
        Value selection methods
        
        Min-Conflicts
        Choose value that violates the fewest constraints
        
        Iterative Min-Conflicts
        While we are not at a solution:
        
        Choose a random conflicted variable
        
        Pick a value for this that has minimum conflicts
        
        Works really well on n-queens due to dense solution space
        
        Very fast but not-complete,
        Can get stuck at local optimums
  - - - Game Tree Search
        
        Goal State
        Winning positions
        
        Actions
        One for each legal move
        
        Expand Function
        Generate legal moves
        
        Evaluation Function
        Assigns score to each board state
        
        Game tree
        All possible games
        
        Initial State
        Initial board state
      - Minimax Search
        
        MAX
        The player, picks moves whose heuristic value is best (maximum)
        
        MIN
        The opponent, picks moves that minimise MAX's advantage
        
        Search to a given ply using depth-limited search
        
        Evaluate heuristics for leaf nodes
        
        Internal nodes propagate heuristic values towards tree root
        
        MAX nodes take the maximum of their child values (ie taking the best move they can)
        
        MIN nodes take the minimum of their child values (we assume our adversary makes the move that is worst for us)
        
        Pick action with best guaranteed score
      - Minimax Algorithm
        
        MAX-VALUE(state, depth)
        if depth == 0 then return EVAL(state)
        v = -inf
        for each s SUCCESSORS(state) do
        v = MAX(v, MIN-VALUE(s, depth-1))
        end
        return v
        
        MIN-VALUE(state, depth)
        if depth == 0 then return EVAL(state)
        v = inf
        for each s in SUCCESSORS(state) do
        v = MIN(v, MAX-VALUE(s, depth -1))
        end
        return v
      - Properties
        Complete = Yes (if tree is finite)
        Optimal = Yes (against an optimal opponent)
        Time complexity = O(b^m)
        Space complexity = O(bm) - depth first exploration
      - Evaluation Functions
        Typically a linear function in which coefficients are used to weight game features
        Unlikely to be a perfect, computable evaluation function for most games
        Game with uncertainty (backgammon, etc) add notions of expectation.
      - The Horizon Effect
        
        Cannot exhaustively search most game trees
        
        Significant events may exist just beyond that part of the tree we have search
        
        The further we look ahead, the better out evaluation of a position
        
        If we're searching the game tree to a depth of n ply, what happens if our opponent is looking n+1 moves head?
      - Quiescent Search
        
        The quiescence search attempts to mitigate the horizon effect by evaluating extra nodes. The extra nodes are specifically the 'non-quiescent' or volatile ones until the whole situation is relatively stable.
      - Ply Depth
        The number of steps you look ahead
        Very important for succeeding
    - - Strong Methods
        Can only be applied to specific instances, takes account of the structure of the game itself
        e.g. board symmertry
        
        Board Symmetry
        In Noughts and Crosses, the board is symmetrical and most initial moves are symmetrical and equivalent
        Can be used to reduce initial states from 9 to 3 as you only consider corner, edge and centre
        Gradually loses effect and more marks on board means less possibl symmetries
      - Weak Methods
        Can apply to any game
        e.g. alpha-beta pruning
        
        Alpha-beta Pruning
        Performs DFS the same as minimax but allows us to disregard certain branches of the tree
        
        Alpha
        Represents the lower bound on node value (the worst we can do)
        
        Associated with MAX nodes
        
        Starts low and increases
        (Never decreases)
        
        Beta
        Represents the upper bound on node value (the best we can do)
        
        Associated with MIN nodes
        
        Starts high and decreases
        (Never increases)
        
        If the best we can do on the current branch is <= the worst we can do elsewhere, there's no point continuing this branch
        
        Properties
        Guaranteed to give the same values as MInimax
        If there tree is ordered, complexity is O(b^d/2)
        Minimax is O(b^d)
        With alpha-beta you can search twice the depth with the same effort
        Perfect ordering not possible, if it was we wouldn't need alpha-beta
        In practice, running time close to optimistic estimate
- - - - Labels
      - Tune model to predict labels
    - - No labels
      - Find patterns/clusters in data
  - - - Collection of features
      - A feature is a measurable piece of data that can be used for analysis:
        Name, Age, Sex, etc etc
    - - Updates driven by mis-classification
      - Converges only if data is linearly separable. Even then there can be multiple solutions
      - Loss function
        A quantification of the mismatch between model output and real output
        
        Number of miss-classification
        
        Discrete Steps
        
        Hard to train
        
        Perceptron Criterion
        
        Smooth
        
        Differentiable
        
        Can implement gradient descent
      - Multi-layered Perceptrons
        Dealing with non-linear boundaries
        
        Non-linear activation
        Example: ReLU
        
        Multiple Linear maps
        
        To separate XOR, use two sets of weights and sigmoid+ as threshold function
        
        Internal Representation
        Hidden Layer (H)
        
        Inputs are mapped to N nodes rather than 1. N is the number of sets of weights the inputs are dot-producted with
        
        Then these N nodes are mapped to a single node like a standard Perceptron with another weights set
      - Activation Function
        A function that defines the output of a node for all its inputs
        
        Threshold Function
        
        With bias
        y = {1, if x>0
        -1/0 otherwise (depending on model)}
        
        Without bias
        y = {1, if x>(theta)
        -1/0 otherwise (depending on model)}
        where theta needs to be specified
        
        How the threshold function works depends on if the algorithm includes a bias. A bias is an extra input, x0 whose value is 1 and an extra weight, w0. This removes the needs to specify the threshold value and means the threshold value can updates as the weight changes.
        
        ReLU Function
        y = {x, if x>0
        0, otherwise}
  - - - Marginal Probability
        P(A = a)
        
        Obtained through marginalization
        P(A = a) = P( A = a ^ B = b1) + P( A = a ^ B = b2) + ...
      - Distributions
        
        Marginal Distribution
        Contains information on purely the marginal probabilities of the variables
        
        Joint Distribution
        Contains the maximum amount of possible information as it shows all possible joint probabilities for given variables
      - Conditional Probability
        P(A = a | B = b) :
      - Joint Probability
        P( A = a ^ B = b)
      - Conditional Independence
        
        A and B are conditionally independent if
        P(A|B)^P(C) = P(A|C)
        
        If we know about C, then learning about B adds no more information about A
        
        Basic Relations
        
        Disconnected
        
        Serial
        
        Diverging (Common-cause/confounder)
        
        Converging (Multiple cause/collider/selection)
    - - (A ^ B)
        
        = P(B|A) * P(A)
        
        = P(A|B) * P(B)
        
        Since (A ^ B) == (A ^ B)
        
        P(B|A) P(A) == P(A|B) P(B)
        
        Therefore
        
        P(B|A) = P(A|B) * P(B) / P(A)
        
        P(A|B) = P(B|A) * P(A) / P(B)
  - - - Supervised Learning
        Instruct correct answer/label
      - Reinforcement Learning
        Evaluate action undertaken
    - - Utility
        Desirability of outcome
        
        Utility Function
        V(s), V: S -> Real
        Calculates desirability of input state
        
        Expected Utility
        EV(a)
        The expected value of an action to an agent, calculated by multiplying the value to the agent of each possible outcome of the action by the probability of that outcome occurring and then summing those numbers
      - Action
        A transition between states in a given environment
      - P(s'|s,a)
        Probability of being in s' upon taking action a from s
      - argmax F(x)
        A function that chooses x such that it maximises F
        Used in RL with a = argmax EV(a), e in range (a) to choose action a that maximises expected utility.
      - Greedy
        An algorithm/choice that may depend on choices made so far/data collected so far but not on future choices or all solutions to the problem
    - - Select non-greedy, uncertain choice over certain greedy choice as current model could be imperfect and this choice could be better and would improve estimate.
      - I Function
        I[a] = 1 if pred a is true; 0 otherwise
      - Qt(a) Function
        Qt(a) calculates the average reward for a given action in time t
        This value is an estimate of the true value
        
        Calculated by dividing the sum of the rewards for that action by the number of times that action was used.
      - Nt(a)
        Function that calculates how many times an action was taken in time t
      - q(a) Function
        q(a) calculates the actual expected reward for action a.
        Can be achieved when Nt(a) -> infinity
      - A greedy algorithm maximises Qt(a)
      - Exploration
        Collect data on which inputs give the best reward
        With probability e explore non-greedy options
        
        How do you choose epsilon (e)?
        
        Epsilon-First Method
        0<=t<=T where T = number of actions:
        Explore for 0<=t<=eT
        Exploit for eT<=t<=T
        Where:
        e < 1 (epsilton)
        t = current action number
        
        Epsilon-Greedy Method
        At each choice:
        Exploit optimal choice with probability (1-e)
        Explore one of remaining (k-1) arms with probability e
        
        Goal
        Optimise strategy to maximise long-term rewards
        The earlier the expected rewards converge, the better the strategy
      - Exploitation
        Used the data collected in exploration to choose the best input
        Exploits current knowledge to maximise immediate reward
  - - - Trajectory Looks like:
        S0, A0, R1, S1, A1, R2, S2, A2, R3 ...
    - - Markovian Dynamics
        State at time T and Reward at time T (St, Rt) depends only on S(t-1) and A(t-1). No memory used.
    - - Finite
        An instance of MDP where states, action and rewrads (S, A, R) are finite
      - Infinite
    - - Deterministic
        Probability of action = 1
      - Non-deterministic
        pi(t) (A | S) = P(A | S)
      - For reinforcement learning: change policy to maximise reward in the long run.
        
        Rewards
        
        Future Rewards
        Gt = R(t+1) + R(t+2) + ... + R(r)
        
        Discounted Future Rewards
        Gt = R(t+1) + yR(t+2) + (y^2)R(t+3) + ...
        Where
        0 <= y < 1
      - Policy Comparison
        Policy pi is better than pi' if expected reward V(pi)(S) >= V(pi')(S) for all s in range S
      - Policy Function pi
        pi(a|s)
        Tells you which action to take when you're in state s
      - Optimal Policy
        The policy that is better than all others
    - - Optimal state-value function V*
        V*(s) = max(pi) V(pi) (s), for all s in S
        For all policies pi, choose the one that gives the largest Value for each state.
      - Optimal action-value function Q*
        Q*(s, a) = max Q(pi) (s,a), for all s in S, a in A
        Used to keep track of reward received for specific actions. Choose the one that maximises the action value function Q(pi)
    - - Reinforcement Learning
        Updating policy as result of experience to maximise reward in the long run
      - Learn optimal action policy
        pi* : S -> A
        Map from states to action that maximises value V(pi)(s) - the expected discounted future rewards
        pi* = argmax V(pi) (s) for all s in S
      - Unlike supervised learning, to learn pi*, we do not have examples of form (state, action), instead we have ((state, action), reward)
    - - Value of state s given policy pi
        V(pi) (S) = E(pi)[Gt | pi, s]
        Calculates expected discounted future reward used to guide future actions
        
        However, in Reinforcement Learning, we do not know pi(a|s) or p(s',r | s, a)
        
        Can be expressed using the Bellman equation for V
        
        That expresses the value of the state in terms of the values of the successor states
        
        Estimating Vpi(s)
        For RL
        
        Temporal Difference
        TD(lambda)