Please enable JavaScript.
Coggle requires JavaScript to display documents.
Joins - Coggle Diagram
Joins
Block Nested Loop Join
Algorithm
-
-
Inner loop: for each page in S, for each tuple, probe hash table
-
-
-
-
Hash Join
-
Bucket size
-
-
when matching, want to do 1 pass using BNLJ =>
one of two relations is <= B-2
I/O cost
-
-
-
if Mr <= B-1, HJ only one pass = BNLJ 1 pass
Comparison with BNLJ
Mr <= B-2
both run in 1 pass, same cost
-
-
-
Index Nested loop join
-
I/O cost = Mr + |R| * I*,
|R| = # of tuples, I* = I/O cost of seaching an index
-
-
Sort Merge Join
Basic
Algorithm
- Sort R and S on join attribute, using external merge sort
- Read sorted relations in buffer and merge
if already sorted, skip first step
Step 2 I/O = Mr + Ms, if no duplicates / backup
Duplicates: merging not linear, need to backup, worst Mr * Ms => single join attribute
-
Total I/O = sort(R) + sort(S) + Mr + Ms, best case, key-foreign key
Optimized
Algorithm
1 pass: generated sorted runs of size ~2B (replacement sort),
for R and S
-
-
Buffer size?
-
-
to do k-wat merge, need k+1 buffers
(Mr + Ms) / 2B <= B - 1
=> B^2 >= max{Ms, Mr}
=> 3(Mr + Ms)
Comparison with HJ
-
2-pass join, SMJ needs B^2 > Ms
2-pass join, HJ needs B^2 > Mr
-
SMJ better if need to sort anyway, better if skewed data
if B+ tree on search key (already sorted?), SMJ better
General Join
-
Inequality
-
-
BNLJ: always apply, default