Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 9: Data Integration (Joins (2 types (Outer (left outer join (goal…
Chapter 9: Data Integration
Joins
combines 2 datasets with a shared identity value
Tables
can be any rectangular matrix containing data in column form where each column begins with a descriptive name
2 types
Inner
carefully curated databases
Outer
left outer join
goal of collecting as much customer data as possible
right outer join
full outer join
connecting lists if there are overlaps between rows and a shared identifier to allow for integration
Union
lines up each column containing similar information on top of another
based on the assumption that there are multiple columns in common between A and C
if there is no overlap but additional points, a union is better
i.e. dataset containing training and test cases for your machine learning
common where companies seek to evaluate employee skillsets