Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 9: Data Integration (Joins (To access more features, Combines two…
Chapter 9: Data Integration
Joins
To access more features
Combines two datasets with a shared identify value
Ex: two datasets with CustomerID
Look at Table 9.2 - Table 9.10 examples to understand how joins are applied
Helps reduce tables in order to be suitable for machine learning
Unions
to access more observations
Based on the assumption that there are multiple columns in common between two tables and creates a new table
Ex: if records stored in various datasets, union will create one table
Used when datasets contain unique sets of cases sharing the same or very similar columns
Acceptable for when there is no overlap but simply additional data points to combine lists
Types of joins
Inner join
Each row in one table is combined horizontally with any row in another table that has the same value
Most common
Outer join
Left outer join
Produces same results as inner join but adds any rows from the left table that do not have corresponding rows in the right table
Null ex: indicate that a customer bought nothing yet
Most useful for collecting as much customer data as possible
Right outer join
Produces the same result as inner join but adds any rows from the right table that do not have corresponding rows in the left table
Null ex: Indicate that we do not know the person who bought a product is
Useful for abandoned shopping cards- anonymous users
Full outer join
Produces same result as inner join + left outer join + right outer join
All information from both tables
Great for connecting different lists if there are overlaps between rows and a shared identifier to allow for integration
Joins vs. Unions example