Please enable JavaScript.
Coggle requires JavaScript to display documents.
Intelligent Systems - Coggle Diagram
Intelligent Systems
Classical AI
History and Philosophy
Search and Planning
Classical Tradeoffs
Speed
Memory
'Optimality'
Types of Problem
Single-state
Deterministic, fully observable
Agent knows exactly which state it will be in; solution is a sequence
Problem formulation
Initial state
Actions / Successor function
Goal Test
Explicit
e.g. "at Bucharest"
Implicit
e.g. checkmate
Path cost (additive)
e.g. sum of distances
Solution Formulation
A sequence of actions leading from the initial sate to a goal state
Sensorless
Non-observable
Agent may have no idea where it is; solution is a sequence
Contingency
Nondeterministic and/or partially observable
percepts provide new information about current state
Often involve search, execution
Exploration
Unknown state space
Tree Search Algorithms
Basic Idea
Offline simulated exploration of state space by generating successors of already-explored states (aka expanding states)
Measures of space and time complexity
b
Maximum branching factor of the search tree
d
Depth of the least-cost solution
m
Maximum depth of the state space
Uninformed Strategies
Breadth-first search
Implementation
Expand shallowest node
fringe is a FIFO queue
Properties
Complete
= Yes (if b is finite)
Time
= 1 + b + b^2 + b^3 + ... + b^d + b(b^d -1) = O(b^(d+1))
Space
= O(b^(d+1)) (keeps every node in memory)
Optimal
= Yes (if cost = 1 per step)
Depth-first search
Implementation
Expand deepest unexpanded node
fringe is a LIFO queue
Properties
Complete
= No (fails in infinite depth spaces and spaces with loops, can be modified to avoid repeated states along path which makes it complete in finite space)
Time
= O(b^m), terrible if m is much larger than d, can be much faster than breadth-first if solutions are dense)
Space
= O(bm), linear!
Optimal
= No
Depth-limited search
Implementation
Deep-first search with depth limit n
Nodes at depth n have no successors
Properties
Complete = No (Solution maybe be deeper than n)
Time = ?
Iterative deepening search
Implementation
Runs multiple depth-limited searches with increasing depth until it find result
Properties
Complete = Yes
Time = O(b^d)
Space = O(bd)
Optimal = Yes (If step cost = 1)
Heuristic Searches
Impetus: Naive search space is usually much, much bigger that you expect
Best First
Use an evaluation function f(n) to estimate the desirability of each node and expand the most desirable un-expanded node.
A*
Implementation
Evaluation function is a combination of heuristic h(n) and cost to reach node n so far, g(n). i.e. f(n) = g(n) + h(n)
Theorem
If h(n) is admissible, A* using TREE-SEARCH is optimal.
This is not the case for GRAPH-SEARCH
Theorem
If h(n) is consistent, A* using GRAPH-SEARCH is optimal
Characteristics
A* expands all nodes such that f(n) < C*
Possibly expands some nodes on the right of G where f(n) = C*
Never expands nodes where f(n) > C*
No other algorithm is guaranteed to expand less nodes than A* unless there are many tie-breaking nodes
Properties
Complete = Yes (unless infinitely many nodes with f <= f(G))
Time = Exponential in length of optimal solution
Space = Keeps all expanded nodes in memory
Optimal = Yes
Greedy Best-First
Implementation
Best First search that expands purely based on heuristic, selecting the option that appear to be the best
Properties
Complete = No (Can get stuck in loops like depth-first is heuristic is bad)
Time = O(b^m), but a good heuristic can give dramatic improvements
Space = O(b^m), keeps all nodes in memory
Optimal = No
Admissible Heuristic
A heuristic is admissible if for every node n, h(n) <= h
(n) where h
(n) is the true cost of reaching the goal state from n
An admissible heuristic never over-estimate the cost to reach the goal.
Example
h(sld)(n), the straight line distance of node n to goal node, never over-estimates the actual road distance
Consistent Heuristic (or monotonic)
A heuristic is admissible if h(n) <= c(n,a,n') + h(n')
i.e. f(n) is non-decreasing along any path
Proof
If h is consistent, we have:
f(n') = g(n') + h(n')
f(n') = g(n) + c(n,a,n') + h(n')
f(n') >= g(n) + h(n)
f(n') >= f(n)
Dominating Heuristic
If both h1(n) and h2(n) are both admissible and h1(n) >h2(n) then h1 dominates h2
The dominating heuristic the better one for searching.
Effective Branching Factor
The average number of nodes expanded
Relaxed Problems
A problem with fewer restrictions on it than the original problem
The cost of an optimal solution to a relaxed problem is an admissible heuristic for the original problem
Depth-First Search vs Backtracking Search
Exact same order of traversal
Different internal structure
Depth first search adds all the successors to a queue
Back tracking iterates through the sucessors, recursing on each and thereby not requiring a data structure
States vs Nodes
State
A representation of a physical configuration
Node
A data structure constituting part of a search tree includes state, parent node, action, path cost g(x), depth
Evaluation Dimensions
Completeness
Does is always find a solution if one exists?
Time Complexity
Number of
nodes
generated
Space Complexity
Maximum number of nodes in memory
Optimality
Does is always find a least-cost solution?
Graph Search
Basic Search
Bidirectional Search
Do two searches, one starting at the initial and one starting at the goal state. b^(d/2) + b^(d/2) is much less than b^d.
At each step, check if a node is in the fringe of the other
Assumes Pred(x) is easy to compute
Easy if Pred(x) = Succ(x)
If many goal states:
Add a dummy node which has the true goal states as predecessors
Readings
Picard: Data
Turing: Turing Test
Searle: Chinese room
Brookes: Elephants
Constraint Satisfaction Problems
State
Variables Vi with values from domain Di
Goal Test
A set of constraints specifying allowable combinations of values for subsets of variables
Examples
Map colouring
Variables: {WA, NT, Q, NSW, V, SA, T}
(Regions of australia)
Domains: {Red, Blue, Green}
Constraints: Adjacent regions must have different colours
N-Queens
Real-World
Assignment Problems
e.g. who teaches which class
Timetabling problems
e.g. which class is offered when and where
Transportation Scheduling
Factory scheduling
Binary CSP
Each constraint relates two variables
Constraint Graph
Node are variables, arc are constraints
Varieties of CSP
Discrete Variables
Finite Domains
n variables
domain size d means o(d^n) complete assignments
Examples:
Boolean CSPs, Boolean Sat
Infinite Domains
integers, strings, etc..
Examples:
Job Scheduling
Continuous Variables
Linear constraints solvable in polynomial time by linear programming
Examples:
Start/end times for hubble space telescope observations
Varieties of Constraints
Unary
Constraints involve a single variable
e.g. SA /= green
Binary
Constraints involves pairs of variables
SA /= WA
Higher-order
Constraints involve 3 or more variables
e.g. cryptarithmetic column constraints
Formulations
Standard
Initial State
The empty assignment
Successor function
Assign a value to an unassigned variable that does not conflict with current assignment.
Fail if no legal assignments
Goal test
The current assignment is completed
Properties
Solutions appear at depth n with n variables
-> Use depth-first search
Path is irrelevant
Backtracking search is the basic uninformed algorithm for CSPs
Heuristics Based
Most constrained variable
Choose the variable with the fewest legal values
As these are most likely to prune the search tree
Most constraining variable
The variable with the most constraints on remaining variables
Most constrain
ed
is more important but most constrain
ing
is a useful tie-breaker
Least constraining value
Given a
variable
, choose the least constraining
value
.
That's the one that rules out the fewest values in the neighbouring variables to leave other variables as open as possible.
Requires forward checking.
Forward checking
Keep track of remaining legal values for unassigned variables
Terminate search when any variable has no legal values
Local Searches
Examples
Hill-climbing, simulated annealing
How to apply to CSPs
allow states with constraint violaions
operators reassign variable values to try to minimise/eliminate conflicts
Value selection methods
Min-Conflicts
Choose value that violates the fewest constraints
Iterative Min-Conflicts
While we are not at a solution:
Choose a random conflicted variable
Pick a value for this that has minimum conflicts
Works really well on n-queens due to dense solution space
Very fast but not-complete,
Can get stuck at local optimums
Adversarial Searches
Adversarial
Type of search where the is an opponent who also takes moves that must be accounted for
Mimimax
Game Tree Search
Goal State
Winning positions
Actions
One for each legal move
Expand Function
Generate legal moves
Evaluation Function
Assigns score to each board state
Game tree
All possible games
Initial State
Initial board state
Minimax Search
MAX
The player, picks moves whose heuristic value is best (maximum)
MIN
The opponent, picks moves that minimise MAX's advantage
Search to a given ply using depth-limited search
Evaluate heuristics for leaf nodes
Internal nodes propagate heuristic values towards tree root
MAX nodes take the maximum of their child values (ie taking the best move they can)
MIN nodes take the minimum of their child values (we assume our adversary makes the move that is worst for us)
Pick action with best guaranteed score
Minimax Algorithm
MAX-VALUE(state, depth)
if depth == 0 then return EVAL(state)
v = -inf
for each s SUCCESSORS(state) do
v = MAX(v, MIN-VALUE(s, depth-1))
end
return v
MIN-VALUE(state, depth)
if depth == 0 then return EVAL(state)
v = inf
for each s in SUCCESSORS(state) do
v = MIN(v, MAX-VALUE(s, depth -1))
end
return v
Properties
Complete = Yes (if tree is finite)
Optimal = Yes (against an optimal opponent)
Time complexity = O(b^m)
Space complexity = O(bm) - depth first exploration
Evaluation Functions
Typically a linear function in which coefficients are used to weight game features
Unlikely to be a perfect, computable evaluation function for most games
Game with uncertainty (backgammon, etc) add notions of expectation.
The Horizon Effect
Cannot exhaustively search most game trees
Significant events may exist just beyond that part of the tree we have search
The further we look ahead, the better out evaluation of a position
If we're searching the game tree to a depth of n ply, what happens if our opponent is looking n+1 moves head?
Quiescent Search
The quiescence search attempts to mitigate the horizon effect by evaluating extra nodes. The extra nodes are specifically the 'non-quiescent' or volatile ones until the whole situation is relatively stable.
Ply Depth
The number of steps you look ahead
Very important for succeeding
Going Beyond Minimax
Strong Methods
Can only be applied to specific instances, takes account of the structure of the game itself
e.g. board symmertry
Board Symmetry
In Noughts and Crosses, the board is symmetrical and most initial moves are symmetrical and equivalent
Can be used to reduce initial states from 9 to 3 as you only consider corner, edge and centre
Gradually loses effect and more marks on board means less possibl symmetries
Weak Methods
Can apply to any game
e.g. alpha-beta pruning
Alpha-beta Pruning
Performs DFS the same as minimax but allows us to disregard certain branches of the tree
Alpha
Represents the lower bound on node value (the worst we can do)
Associated with MAX nodes
Starts low and increases
(Never decreases)
Beta
Represents the upper bound on node value (the best we can do)
Associated with MIN nodes
Starts high and decreases
(Never increases)
If the best we can do on the current branch is <= the worst we can do elsewhere, there's no point continuing this branch
Properties
Guaranteed to give the same values as MInimax
If there tree is ordered, complexity is O(b^d/2)
Minimax is O(b^d)
With alpha-beta you can search twice the depth with the same effort
Perfect ordering not possible, if it was we wouldn't need alpha-beta
In practice, running time close to optimistic estimate
Learning AI :
Types of learning
Supervised
Labels
Tune model to predict labels
Unsupervised
No labels
Find patterns/clusters in data
Perceptrons
Feature Space
Collection of features
A feature is a measurable piece of data that can be used for analysis:
Name, Age, Sex, etc etc
Perceptron Algorithm
Updates driven by mis-classification
Converges only if data is linearly separable. Even then there can be multiple solutions
Loss function
A quantification of the mismatch between model output and real output
Number of miss-classification
Discrete Steps
Hard to train
Perceptron Criterion
Smooth
Differentiable
Can implement gradient descent
Multi-layered Perceptrons
Dealing with non-linear boundaries
Non-linear activation
Example: ReLU
Multiple Linear maps
To separate XOR, use two sets of weights and sigmoid+ as threshold function
Internal Representation
Hidden Layer (H)
Inputs are mapped to N nodes rather than 1. N is the number of sets of weights the inputs are dot-producted with
Then these N nodes are mapped to a single node like a standard Perceptron with another weights set
Activation Function
A function that defines the output of a node for all its inputs
Threshold Function
With bias
y = {1, if x>0
-1/0 otherwise (depending on model)}
Without bias
y = {1, if x>(theta)
-1/0 otherwise (depending on model)}
where theta needs to be specified
How the threshold function works depends on if the algorithm includes a bias. A bias is an extra input, x0 whose value is 1 and an extra weight, w0. This removes the needs to specify the threshold value and means the threshold value can updates as the weight changes.
ReLU Function
y = {x, if x>0
0, otherwise}
Bayesian Networks
Probability
Marginal Probability
P(A = a)
Obtained through marginalization
P(A = a) = P( A = a ^ B = b1) + P( A = a ^ B = b2) + ...
Distributions
Marginal Distribution
Contains information on purely the marginal probabilities of the variables
Joint Distribution
Contains the maximum amount of possible information as it shows all possible joint probabilities for given variables
Conditional Probability
P(A = a | B = b) :
Joint Probability
P( A = a ^ B = b)
Conditional Independence
A and B are conditionally independent if
P(A|B)^P(C) = P(A|C)
If we know about C, then learning about B adds no more information about A
Basic Relations
Disconnected
Serial
Diverging (Common-cause/confounder)
Converging (Multiple cause/collider/selection)
Bayes Rule Derivation
(A ^ B)
= P(B|A) * P(A)
= P(A|B) * P(B)
Since (A ^ B) == (A ^ B)
P(B|A)
P(A) == P(A|B)
P(B)
Therefore
P(B|A) = P(A|B) * P(B) / P(A)
P(A|B) = P(B|A) * P(A) / P(B)
Reinforcement Learning
Supervised Learning vs. Reinforcement Learning
Supervised Learning
Instruct correct answer/label
Reinforcement Learning
Evaluate action undertaken
Terms
Utility
Desirability of outcome
Utility Function
V(s), V: S -> Real
Calculates desirability of input state
Expected Utility
EV(a)
The expected value of an action to an agent, calculated by multiplying the value to the agent of each possible outcome of the action by the probability of that outcome occurring and then summing those numbers
Action
A transition between states in a given environment
P(s'|s,a)
Probability of being in s' upon taking action a from s
argmax F(x)
A function that chooses x such that it maximises F
Used in RL with a
= argmax EV(a), e in range (a) to choose action a
that maximises expected utility.
Greedy
An algorithm/choice that may depend on choices made so far/data collected so far but not on future choices or all solutions to the problem
Exploration vs Exploitation
Select non-greedy, uncertain choice over certain greedy choice as current model could be imperfect and this choice could be better and would improve estimate.
I Function
I[a] = 1 if pred a is true; 0 otherwise
Qt(a) Function
Qt(a) calculates the average reward for a given action in time t
This value is an
estimate
of the true value
Calculated by dividing the sum of the rewards for that action by the number of times that action was used.
Nt(a)
Function that calculates how many times an action was taken in time t
q
(a) Function
q
(a) calculates the actual expected reward for action a.
Can be achieved when Nt(a) -> infinity
A greedy algorithm maximises Qt(a)
Exploration
Collect data on which inputs give the best reward
With probability
e
explore non-greedy options
How do you choose epsilon (e)?
Epsilon-First Method
0<=t<=T where T = number of actions:
Explore for 0<=t<=eT
Exploit for eT<=t<=T
Where:
e < 1 (epsilton)
t = current action number
Epsilon-Greedy Method
At each choice:
Exploit optimal choice with probability (1-e)
Explore one of remaining (k-1) arms with probability e
Goal
Optimise strategy to maximise long-term rewards
The earlier the expected rewards converge, the better the strategy
Exploitation
Used the data collected in exploration to choose the best input
Exploits current knowledge to maximise immediate reward
Markov Decision Process
Action chosen dependent on situation/state
Agent receives evaluative feedback
Estimating q
(s, a), the best action to take given the state
Rather than q
(a) as with bandits problem.
Sequential Decision Making
Actions (a) influence immediate rewards and future states.
Trajectory Looks like:
S0, A0, R1, S1, A1, R2, S2, A2, R3 ...
Delayed and immediate rewards tradeoff
Markov Assumption
At each step, the next state and rewards depends only on the current state and the action taken, no memory
Markovian Dynamics
State at time T and Reward at time T (St, Rt) depends only on S(t-1) and A(t-1). No memory used.
Types of MDP
Finite
An instance of MDP where states, action and rewrads (S, A, R) are finite
Infinite
Policy
Choice of action between states that receive rewards
Deterministic
Probability of action = 1
Non-deterministic
pi(t) (A | S) = P(A | S)
For reinforcement learning: change policy to maximise reward in the long run.
Rewards
Future Rewards
Gt = R(t+1) + R(t+2) + ... + R(r)
Discounted Future Rewards
Gt = R(t+1) + y
R(t+2) + (y^2)
R(t+3) + ...
Where
0 <= y < 1
Policy Comparison
Policy pi is better than pi' if expected reward V(pi)(S) >= V(pi')(S) for all s in range S
Policy Function pi
pi(a|s)
Tells you which action to take when you're in state s
Optimal Policy
The policy that is better than all others
Optimal Value Functions
Optimal state-value function V*
V*(s) = max(pi) V(pi) (s), for all s in S
For all policies pi, choose the one that gives the largest Value for each state.
Optimal action-value function Q*
Q*(s, a) = max Q(pi) (s,a), for all s in S, a in A
Used to keep track of reward received for specific actions. Choose the one that maximises the action value function Q(pi)
Learn Policy
Learn a map from states to probability of action based on rewards
Reinforcement Learning
Updating policy as result of experience to maximise reward in the long run
Learn optimal action policy
pi* : S -> A
Map from states to action that maximises value V(pi)(s) - the expected discounted future rewards
pi* = argmax V(pi) (s) for all s in S
Unlike supervised learning, to learn pi*, we do not have examples of form (state, action), instead we have ((state, action), reward)
Calculating the value of a state
Value of state s given policy pi
V(pi) (S) = E(pi)[Gt | pi, s]
Calculates expected discounted future reward used to guide future actions
However, in Reinforcement Learning, we do not know pi(a|s) or p(s',r | s, a)
Can be expressed using the Bellman equation for V
That expresses the value of the state in terms of the values of the successor states
Estimating Vpi(s)
For RL
Temporal Difference
TD(lambda)