Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 9: Data Integration (Union (Based on assumption that there are…
Chapter 9: Data Integration
Union
Based on assumption that there are multiple columns in the common between A and C
A union lines up each each column containing similar information on top of another
Combine two datasets
Use when we have datasets that contain unique sets of cases sharing the same or very similar columns
lots of info, needs to be integrated into one system
If no overlap use a union, good for combining lists that consist of different people
Allows you to check for duplicate records
Join
Combines two datasets with a shared identity value, such as a customer identifier
Through a join, one or more new rows are created, each containing information from both tables
Take a lot of experience to understand
Two Types
Inner Join
Each row in left table is combined horizontally with any row in the right table that has the same identity value
Will use with carefully curated databases
Outer Join
Left Outer Join
This join produces the same result as an inner join but also adds any rows from the left table that does not have corresponding rows in the right table
Will use if engaged with a project where goal is collect all information and the table with the most information is used first
Right outer Join
This join produces the same result as an inner join, but also adds any rows from the right table that does not have corresponding rows in the left table
Full outer join
This join produces the same result as an inner join + a left outer join + a right outer join.