Chapter 9: Data Integration (Additonal data = (better predictions from…
Chapter 9: Data Integration
Additonal data =
better predictions from algorothoms
better for ML
It is key to aggrigate data from multiple sources
methods of integration
Combines two datasets with shared identity value
Ex. Coustomer identifier
one or more rows are created
types of joins
Each row in the left table is combined horizontaly with any row in the right table with the same identity value.
Left outer join
adds any rows from the left table that do not have corresponding rows in the right table. Adds "Nulls"
Right outer join
adds any rows from the right table that do not have corresponding rows in the left table.
Full outer join
inner join + left outer join + right outer join. Information from both tables combined.
Join several tabels togethers so that the result is sutible for ML
stacks simmler columns on top of each other
no overlap but simply additonal data points
combine training and test data
ensures all of modifcations are shared
combines two data sets