Please enable JavaScript.
Coggle requires JavaScript to display documents.
ALDS - Coggle Diagram
ALDS
Algorithms and data structures
Algorithms
Complexity
Asymptotic Notation
Types
Average Bound(Θ)
Upper Bound(O)
Non-Tight Upper Bound
f(n)/g(n)->0 as n->inf
Non-Tight Lower Bound
f(n)/g(n)->inf as n->inf
Lower Bound(Ω)
Properties
if f(n) = O(g(n)) then c*f(n) = O(g(n))
Reflexive
f(n) = O(f(n))
f(n) = Ω(f(n))
f(n) = Θ(f(n))
Transitive
if f(n) = O(g(n)) and g(n) = O(h(n)) then f(n) = O(h(n))
if f(n) = Θ(g(n)) and g(n) = Θ(h(n)) then f(n) = Θ(h(n))
if f(n) = Ω(g(n)) and g(n) = Ω(h(n)) then f(n) = Ω(h(n))
if f(n) = ω(g(n)) and g(n) = ω(h(n)) then f(n) = ω(h(n))
Symmetric
If f(x) = Θ(g(n)) then g(n) = Θ(f(n))
Transpose Symmetric
if f(n) = O(g(n)) then g(n) is Ω(f(n))
if f(n) = o(g(n)) then g(n) is ω(f(n))
if f(n) = O(g(n)) and f(n) = Ω(g(n)) then f(n) = Θ(g(n))
if f(n) = O(g(n)) and d(n) = O(e(n)) then f(n) + d(n) = O(max(g(n), e(n)))
if f(n) = O(g(n)) and d(n) = O(e(n)) then f(n) * d(n) = O(g(n) * e(n))
if f(n) = Θ(g(n)) if and only if f(n) = O(g(n)) and f(n) = Ω(g(n))
Asymptotically Smaller
We say that f(n) is asymptotically smaller than g(n) if f(n) = o(g(n))
Asymptotically Larger
We say that f(n) is asymptotically smaller than g(n) if f(n) = ω(g(n))
Trichotomy
a < b, a = b, or a > b
Trichotomy does not apply to asymptotic notation
Complexity Analysis
Iterative Functions
Recursive Functions
Backward Substitution
Recursion Tree
Master Theorem
Comparison of function complexity
Use log to simplify
Space Complexity
Highest stack usage
Standard Formulas
For r=1 to n∑1/r ~= For x = 1 to n∫dx/x = log(n)
Tricks
Identify log(n)
If the loop variable increases or decreases due to multiply or divide then the loop will run for log(n) times
Increasing
Initial value 1
Final value < n
Increment by multiplying by k
Example
n = 8, k = 2
1, 2, 4 => 3 times = log2(8)
Increasing
Initial value n
Final value > 1
Decrement by dividing by k
Example
n = 8, k = 2
8, 4, 2 => 3 times = log2(8)
Use-Cases
Maximum number of comparisons to merge a set of sorted arrays with best strategy
For two arrays A1(size m) and A2(size N)
All the elements of A1 array are between the last two elements A2
All the elements of A2 except the last one will be compared with the first element of A1 and be copied to output array = n - 1
All the elements of A1 will be compared with the last element of A2 = m comparisons
Total comparisons = m + n - 1
Comparing the smallest sized two arrays will be the best strategy
Comparing the small with the large will increase overall comparisons
The two arrays are merged and then with the combined array the algorithm is repeated again
Divide And Conquer
Dynamic Programming
Dynamic programming, like the divide-and-conquer method, solves problems by combining the solutions to subproblems.
Dynamic programming applies when the subproblems overlap—that is, when subproblems share subsubproblems
Steps
Characterize the structure of an optimal solution
Recursively define the value of an optimal solution
Compute the value of an optimal solution, typically in a bottom-up fashion
Construct an optimal solution from computed information
Examples
Rod Cutting
Top-To-Bottom
Bottom-To-Top
Print solution
Steps
For each length i of rod from 1 to n
For each rod length j from 1 to i
Optimal BST
Coreman
Optimal Substructure
If an optimal binary search tree T has a subtree T' containing keys ki ; : : : ;kj , then this subtree T' must be optimal as well for the subproblem with keys ki ; : : : ;kj and dummy keys di-1; : : : ;dj.
When subtree becomes a subtree of a node the depth
of each node in the subtree increases by 1
The expected search cost of this subtree increases by the sum of all the probabilities in the subtree
Subset Sum
Subset Sum
(Row, Col) = True means that Col sum can be satisfied with elements of the array will Row Index
(*,0) is all true as 0 Sum can be satisfied by skipping all elements of the array
Set (i, i) to True as a single value of array can satisfy the corresponding sum
For other elements apply (Row, Col) = (Row - 1, Col) + (Row-1, Col - A[Row])
Create table from Sum+1 columns(0-Sum) and rows = Size of array
Time Complexity
O(Sum*Size of Array)
Matrix Multiplication
Partition all possibilities
Technique
Identify largest array which induces a large number of multiplies
Remove candidates which cause large multiplies
Reduce multiplies by collating in such a way that the large arry is multiplied with small arrays
Longest Common Subsequence
Algorithm
if A[i] == A[j]
1 + LCS(i+1, j+1)
else
max(LCS(i+1, j), LCS(i, j+1))
Time Complexity
O(mn)
Print instances of longest
0-1 Knapsack
Types
Top-Down with Memoization
Bottom Up
Subproblem Graphs
Reconstructing a solution
Greedy Algorithm
Steps
Determine the optimal substructure of the problem
Develop a recursive solution
Show that if we make the greedy choice, then only one subproblem remains
Prove that it is always safe to make the greedy choice
Develop a recursive algorithm that implements the greedy strategy
Convert the recursive algorithm to an iterative algorithm
Examples
Job Sequencing Algorithm
Algorithm
Sort all the given jobs in decreasing order of their profit
Check the value of maximum deadline
Draw a Gantt chart where maximum time on Gantt chart is the value of maximum deadline.
Pick up the jobs one by one
Put the job on Gantt chart as far as possible from 0 ensuring that the job gets completed before its deadline.
Time Complexity
O(nlog(n)) to sort by profit
O(n^2) for finding the next job
Total O(n^2)
Huffman Coding
Algorithm
Create a leaf node for each character of the text
Leaf node of a character contains the occurring frequency of that character
Arrange all the nodes in increasing order of their frequency value
Considering the first two nodes having minimum frequency
Create a new internal node
The frequency of this new node is the sum of frequency of those two nodes
Make the first node as a left child and the other node as a right child of the newly created node
Keep repeating steps until all the nodes form a single tree
The tree finally obtained is the desired Huffman Tree
Note
Keep smaller value as left sub-tree
Average Code Length = Sum(FREQi x LENGTHi)/Sum(FREQi)
Time Complexity
O(nlog(n)) for sorting by frequency
Activity Selection
Algorithm
Select the first activity from the sorted array and print it
Do following for remaining activities in the sorted array
If the start time of this activity is greater than or equal to the finish time of previously selected activity then select this activity and print it
Sort the activities according to their finishing time
Time Complexity
O(nlog(n)) if not sorted
O(n) is already sorted
Sorting
Quick Sort
Overview
Partition
Pivot selection
Always pick first element as pivot
Always pick last element as pivot
Pick a random element as pivot
Pick median as pivot
Partitioning
Hoare Partition
Lomuto Partition
Complexity
Time complexity
Average
nlogn
Worst
n^2
Best
nlogn
Properties
Good locality of reference
Cache friendly
Tail recursion
Bubble Sort
Time complexity
Average
n^2
Worst
n^2
Best
n
Properties
Efficient for small lists
Fast for mostly sorted lists
Note
Assume bubble sort with check after each iteration for swap. If no swap then already sorted.
Insertion Sort
Time complexity
Average
n^2
Worst
n^2
Best
n
Properties
Similar to Binary Heap
Selection Sort
Time complexity
Average
n^2
Worst
n^2
Best
n^2
Properties
Preferred for arrays
Cache friendly
Minimum writes
Useful when swapping is expensive
Inefficient for large lists
Merge Sort
Time complexity
Average
nlogn
Worst
nlogn
Best
nlogn
Properties
Preferred for arrays
Preferred for large amounts of data
Shell Sort
Time complexity
Average
n^(4/3)
Worst
n^(3/2)
Best
nlogn
Algorithm
Swap pairs of elements N/2 distance apart
Example: if N = 8, then 0-3, 1-4 etc pairs will be compared and swapped
Repeat by decreasing distance till it becomes 1 and degenerates to insertion sort
Properties
Small code size
No use of call stack
Reasonably fast
More efficient for larger lists
Radix Sort
Time complexity
Average
(n+b) * logb(k)
Worst
(n+b) * logb(k)
Best
(n+b) * logb(k)
k is the maximum possible value
Apply counting sort for each digit
External Sorting
Properties
In-Place
Stable
Adaptive
A sorting algorithm falls into the adaptive sort family if it takes advantage of existing order in its input
It benefits from the presortedness in the input sequence – or a limited amount of disorder for various definitions of measures of disorder – and sorts faster
Adaptive sorting is usually performed by modifying existing sorting algorithms
Example
Bubble Sort
Insertion Sort
Quick sort
Bucket Sort
Counting Sort
Tim Sort
We divide the Array into blocks known as Run
We sort those runs using insertion sort one by one and then merge those runs using the combine function used in merge sort
If the size of the Array is less than run, then Array gets sorted just by using Insertion Sort
Recursion
Types of recursion
Direct Recursion
Indirect Recursion
Nested Recursion
Excessive Recursion
Example
Towers of Hanoi
T(n) = 2T(n-1) + 1
2^n -1
Tail Recursion
Recurrence
Subproblems are not necessarily constrained to being a constant fraction of the original problem size
Methods for solving recurrences
Substitution method
In the substitution method, we guess a bound and then use mathematical induction to prove our guess correct
Recursion-tree method
The recursion-tree method converts the recurrence into a tree whose nodes represent the costs incurred at various levels of the recursion.
We use techniques for bounding summations to solve the recurrence
Master method
Form 1
Case 1
If f(n) = O(n^(logb(a) - eps) for some constant eps > 0
T(n) = Θ(n^(logb(a)))
Case 3
If f(n) = Ω(n^(logb(a)+eps)) for some constant eps > 0 and if af(n/b) <= cf(n) for some constant c < 1 and all sufficiently large n
T(n) = Θ(f(n))
Case 2
If f(n) = Θ(n^(logb(a))
T(n) = Θ(n^(logb(a)))lg(n)
T(n) = aT(n/b) + f(n)
Change of Variable
Form 2
T(n) = aT(n/b) + Θ(n^k*log^p(n))
Case 1
if a > b^k
1 more item...
Case 2
if a = b^k
3 more items...
Case 3
if a < b^k
2 more items...
Cases
Base Case
Recursive Case
Examples
The Maximum-Subarray Problem
Three Cases
Maximum Sub-Array in left half
Maximum Sub-Array in right half
Maximum Sub-Array passing through the middle
Time Complexity
O(nlogn)
Strassen’s Algorithm
Pad with 0s to convert to power of 2
Convert to block matrix
Express using a representation where we can reduce the number of multiplications
A recurrence is an equation or inequality that describes a function in terms of its value on smaller inputs
Backtracking
8-Queen Problem
For a given column, take each row position
Check if the position if valid
If valid do recursively for next column
Backtrack once all possibilities are exhausted for a column
Time Complexity
n^n worst case
Can be reduced to n!
Data Structures
Hashing
Direct-address tables
Direct addressing is a simple technique that works well when the universe U of keys is reasonably small
Suppose that an application needs a dynamic set in which each element has a key drawn from the universe U = {0; 1; ...m - 1}, where m is not too large
We shall assume that no two elements have the same key
To represent the dynamic set, we use an array, or direct-address table, denoted by T[0..m-1], in which each position, or slot, corresponds to a key in the universe U
For some applications, the direct-address table itself can hold the elements in the dynamic set
That is, rather than storing an element’s key and satellite data in an object external to the direct-address table, with a pointer from a slot in the table to the object, we can store the object in the slot itself, thus saving space
We would use a special key within an object to indicate an empty slot
Moreover, it is often unnecessary to store the key of the object, since if we have the index of an object in the table, we have its key
If keys are not stored, however, we must have some
way to tell whether the slot is empty
Hash Tables
When the set K of keys stored in a dictionary is much smaller than the universe U of all possible keys, a hash table requires much less storage than a direct address table
We can reduce the storage requirement to Θ(|K|) while we maintain the benefit that searching for an element in the hash table still requires only O(1) time.
The catch is that this bound is for the average-case time, whereas for direct addressing it holds for the worst-case time
With hashing, this element is stored in slot h(k) that is, we use a hash function h to compute the slot from the key k
h maps the universe U of keys into the slots of a hash
table T[0...m-1]: h : U -> {0, 1, ...., m - 1} where the size m of the hash table is typically much less than |U|
We say that an element with key k hashes to slot h(k)
We also say that h(k) is the hash value of key k
The hash function reduces the range of array indices and hence the size of the array
Instead of a size of |U|, the array can have size m
Collision
Two keys may hash to the same slot. We call this situation
a collision
Fortunately, we have effective techniques for resolving the conflict
created by collisions.
Solutions
Hash Function
The very term “to hash,” evoking images of random mixing and
chopping, captures the spirit of this approach
A hash function h must be deterministic in that a given input k should always produce the same output h(k)
Because jUj > m, however, there must be at least two keys that have the same hash value; avoiding collisions altogether is therefore impossible
One idea is to make h appear to be “random,” thus avoiding collisions or at least minimizing their number
We might try to achieve this goal by choosing a suitable hash function h
Chaining
In chaining, we place all the elements that hash to the same slot into the same linked list
Insertion
The worst-case running time for insertion is O(1)
The insertion procedure is fast in part because it assumes that the element x being inserted is not already present in the table
If necessary, we can check this assumption (at additional cost) by searching for an element whose key is x:key before we insert
Search
For searching, the worst case running time is proportional to the length of the list
Θ(1 + α)
Delete
We can delete an element in O(1) time if the lists
are doubly linked
CHAINED-HASH-DELETE takes as input an element x and not its key k, so that we don’t have to search for x first
If the hash table supports deletion, then its linked lists should be doubly linked so that we can delete an item quickly
If the lists were only singly linked, then to delete element x, we would first have to find x in the list T[h(x.key)] so that we could update the next attribute of x’s predecessor
With singly linked lists, both deletion and searching would have the same asymptotic running times
Analysis
Load Factor
3 more items...
Cases
2 more items...
Hash Function
A good hash function satisfies (approximately) the assumption of simple uniform hashing: each key is equally likely to hash to any of the m slots, independently of where any other key has hashed to
Unfortunately, we typically have no way to check this condition, since we rarely know the probability distribution from which
the keys are drawn
Moreover, the keys might not be drawn independently
Occasionally we do know the distribution
In practice, we can often employ heuristic techniques to create a hash function that performs well
Qualitative information about the distribution of keys may be
useful in this design process
A good approach derives the hash value in a way that we expect to be independent of any patterns that might exist in the data
Some applications of hash functions might require stronger properties than are provided by simple uniform hashing
Types
3 more items...
Interpreting keys as natural numbers
3 more items...
Open Addressing
In open addressing, all elements occupy the hash table itself
That is, each table entry contains either an element of the dynamic set or NIL
When searching for an element, we systematically examine table slots until either we find the desired element or we have ascertained that the element is not in the table
No lists and no elements are stored outside the table, unlike in chaining
in open addressing, the hash table can “fill up” so that no further insertions can be made; one consequence is that the load factor ˛ can never exceed 1
The advantage of open addressing is that it avoids pointers altogether. Instead of following pointers, we compute the sequence of slots to be examined
The extra memory freed by not storing pointers provides the hash table with a larger number of slots for the same amount of memory, potentially yielding fewer collisions and faster retrieval
Probe
To perform insertion using open addressing, we successively examine, or probe, the hash table until we find an empty slot in which to put the key
Instead of being fixed in the order 0, 1, ..., m-1(which requires ‚ Θ(n) search time), the sequence of positions probed depends upon the key being inserted
Probe Number
To determine which slots to probe, we extend the hash function to include the probe number (starting from 0) as a second input
The hash
function
becomes
h : U x {0, 1, ..., m-1} -> {0, 1, ..., m-1}
Auxiliary Hash Function
With open addressing, we require that for every key k, the probe sequence be a permutation of (0, 1, ..., m-1)
Types
Linear Probing
Linear probing searches the table for the closest following free location and inserts the new key there
Lookups are performed in the same way, by searching the table sequentially starting at the position given by the hash function, until finding a cell with a matching key or an empty cell
m Probing Sequences
Quadratic Probing
The ith probe looks at (h(k)+ i^2)%S location from the first position
m probing sequences
Double Hashing
The ith probe looks at (h1(k) + i*h2(k))%m
Has m^2 probing sequences
Clustering
Primary Clustering
Primary clustering is the tendency for a collision resolution scheme such as linear probing to create long runs of filled slots near the hash position of keys.
If the primary hash index is x, subsequent probes go to x+1, x+2, x+3 and so on, this results in Primary Clustering.
Once the primary cluster forms, the bigger the cluster gets, the faster it grows. And it reduces the performance.
Secondary Clustering
Secondary clustering is the tendency for a collision resolution scheme such as quadratic probing to create long runs of filled slots away from the hash position of keys.
If the primary hash index is x, probes go to x+1, x+4, x+9, x+16, x+25 and so on, this results in Secondary Clustering.
Secondary clustering is less severe in terms of performance hit than primary clustering, and is an attempt to keep clusters from forming by using Quadratic Probing
The idea is to probe more widely separated cells, instead of those adjacent to the primary hash site
DH < QP < LP
Probe Sequence
Seed
Start value of a probing sequence
Total Possible Probe Sequences is m factorial
Good probing should generate m factorial sequences
Deletion
Deletion from an open-address hash table is difficult
When we delete a key from slot i, we cannot simply mark that slot as empty by storing NIL in it
Doing so might make it impossible to retrieve any key k during whose insertion we had probed slot i and found it occupied
Insertion
The best case is O(1) and worst case is O(N) for all of these
Search
1/(1 − α)
α < 1 so m > n
Perfect Hashing
We call a hashing technique perfect hashing if O.1/ memory accesses are required to perform a search in the worst case
To create a perfect hashing scheme, we use two levels of hashing, with universal hashing at each level
Instead of making a linked list of the keys hashing to slot j , however, we use a small secondary hash table Sj with an associated hash function hj
The first level is essentially the same as for hashing with chaining: we hash the n keys into m slots using a hash function h carefully selected from a family of universal hash functions
By choosing the hash functions hj carefully, we can guarantee that there are no collisions at the secondary level
Performance
Number of slots in hash table = m
Number of keys to be inserted = n
Load factor = n/m
Expected time to search = O(1 + LoadFactor)
Expected time to insert/delete = O(1 + LoadFactor)
Tree
Tree is a non-linear data structure which organizes data in a hierarchical structure and this is a recursive definition.
Properties
There is one and only one path between every pair of vertices in a tree
A tree with n vertices has exactly (n-1) edges
A graph is a tree if and only if it is minimally connected
Any connected graph with n vertices and (n-1) edges is a tree
Terminology
Root
Edge
Parent
Child
Siblings
Degree
Degree of a node
Total number of children of that node
Degree of a tree
Highest degree of a node among all the nodes in the tree
Internal Node
Leaf Node
Level
Height
Depth
Subtree
Forest
Set of disjoint trees
Binary Search Tree
Properties
Each node can have a maximum of two children
The value of the key of the left sub-tree is less than the value of its parent (root) node's key.
The value of the key of the right sub-tree is greater than or equal to the value of its parent (root) node's key
The maximum number of nodes at level ‘l’ will be 2l−1 . Here level is the number of nodes on path from root to the node, including the root itself. We are considering the level of root is 1.
Maximum number of nodes present in binary tree of height h is 2h−1 . Here height is the max number of nodes on root to leaf path. Here we are considering height of a tree with one node is 1
In a binary tree with n nodes, minimum possible height or minimum number of levels arelog2⟮n+1⟯ . If we consider that the height of leaf node is considered as 0, then the formula will be log2⟮n+1⟯−1
A binary tree with ‘L’ leaves has at least log2L+1 number of levels
Operations
Search
Insert
Pre-Order Traversal
Visit Root, Left Sub-Tree, Right Sub-Tree
In-Order Traversal
Visit Left Sub-Tree, Root, Right Sub-Tree
Post-Order Traversal
Visit Left Sub-Tree, Right Sub-Tree, Root
Balanced BST
AVL Tree
Balance Factor
RB Tree
Binary Heap
Types
Max-Heap
Min-Heap
Properties
Key at node is greater or less than the key of children nodes
It’s a nearly complete binary tree
All levels are completely filled except possibly the last level and the last level has all keys as left as possible
This property of Binary Heap makes them suitable to be stored in an array
Operations
getMin()
extractMin()
decreaseKey()
insert()
delete()
Application
Max Heap is used to sort in ascending order
Min-Heaps are used to implement priority queues
Implementation
MAX-HEAPIFY
Algo
r = RIGHT(i)
if l <= A.heap-size and A[l] > A[i]
1 more item...
else
1 more item...
if r <= A.heap-size and A[r] > A[i]
largest = r
if largest != i
exchange A[i] with A[largest]
MAX-HEAPIFY(A, largest)
l = LEFT(i)
O(log(n))
BUILD-MAX-HEAP
Algo
for i = floor(A.length/2) downto 1
MAX-HEAPIFY(A, i)
A.heap-size = A.length
floor(A.length/2) is the rightmost node of the second last level with 1 based indexing
O(n)
Queue
Circular Queue using Singly Linked List
Fixed Size
Variable Sized
Insertion
new->next = head
rear->next = new
rear = new
Deletion
head = head->next
rear->next = head
del old
old = head
Stack
Linked List
Singly Linked
Doubly Linked
Circular