Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 9: Data Integration (Introduction (two methods (Unions (to access…
Chapter 9: Data Integration
Introduction
additional data leads to better predictions
two methods
Unions
to access more observations
multiple columns in common
lines up each column containing similar info, creates new table
Joins
to access more features
combines two datasets with a shared identity value
EX: combine customer record with login info
customer visits website many times, results in many rows
one or more new rows are created
Unions
to combine two datasets; datasets sharing the same or similar columns
full outer join
(if rows overlap)
training
cases used to train the model(s)
test
set allow someone to rate the performance of the model(s)
first integrate training/test data sets to be sure that all modifications are shared
Imputation
replace missing data in a column with reasonable values
ex: organization keeps separate lists
Joins
Two Types:
Inner Join
each row in the left table is combined horizontally with any row in the right table that has the same identity value
EX: produces header row + 2 data rows
CustId, Name, CustId, Product
1, Kai, 1, pants
1, Kai, 1, bouncing castle
Outer Join
Left
Outer Join
same result but also adds any rows from the
left
table that do NOT have corresponding rows in the
right
table
EX: produces header row + 3 data rows
CustId, Name, CustId, Product
1, Kai, 1, pants
1, Kai, 1, bouncing castle
2, Dan, NULL, NULL
NULLs
: a sign that there is nothing there
EX: customers who never placed an order
*NULL: indicates that Dan bought nothing
Right
Outer Join
same result but adds any rows from the
right
table that do not have corresponding rows in the
left
table
EX: produces header row + 3 data rows
CustId, Name, CustId, Product
1,Kai,1, pants
1,Kai,1, bouncing castle
NULL, NULL, 3, sunglasses
NULL: indicates that we do not know who bought the sunglasses
EX: abandoned carts
used to combine orders or near-orders
Full
Outer Join
same result as inner + left outer + right outer join
EX: produces header row + 4 data rows
CustId, Name, CustId, Product
1,Kai,1, pants
1,Kai,1, bouncing castle
2, Dan, NULL, NULL
NULL, NULL, 3, sunglasses
all info from both tables combined
any customer without orders is added to resulting table
any order not connected to a customer is added to resulting table
Tables
matrix containing data in column form each with a descriptive name