Please enable JavaScript.
Coggle requires JavaScript to display documents.
Data Integration (Joins (Inner Join (Produces header row and 2 data rows,…
Data Integration
Joins
Combines 2 datasets with a shared identity value
One or more rows are created
Join row containing ID
Inner Join
Produces header row and 2 data rows
Most commonly used for curated databases
Row is combined horizontally with any row
Join on same ID (CustomerID)
Outer Join
Left outer join
Produces header row and 3 data rows
If goal is to collect most customer data as possible
Produces same result as inner join
Also adds rows from left table
Right Outer join
Produces header row and three data rows
Adds rows from right table that don't have core. rows in left
Abandoned shopping carts
Full outer join
Produces header row and 4 data rows
Connect different lists if there are overlaps
Between rows
Shared identifier to allow for integration
Integrate different lists IF
Overlap b/w rows
Shared Identifier
Unions
Based on assumption there are multiple columns in common
Lines up each column containing similar info on top of each other
Use when we have data sets containing unique sets of cases
Ex: Integrate a whole university's info
Perform high-quality analytics in one system
Thats share same/similar columns
Use if there is no overlap but simply more data points
Combine lists of different people
Always correct tool for:
Training and evaluation data
More data
Access to more and relevant data
Leads to better predictions
Unitl we reach point where more cases are no longer helpful
Look for additional features we don't have
Integrate data from different sources