Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 9: Data Integration (Additonal data = (better predictions from…
Chapter 9: Data Integration
Additonal data =
better predictions from algorothoms
addtional fetures
better for ML
It is key to aggrigate data from multiple sources
methods of integration
Join
Combines two datasets with shared identity value
Ex. Coustomer identifier
one or more rows are created
types of joins
Inner Join
Each row in the left table is combined horizontaly with any row in the right table with the same identity value.
Outer Join
Left outer join
adds any rows from the left table that do not have corresponding rows in the right table. Adds "Nulls"
Right outer join
adds any rows from the right table that do not have corresponding rows in the left table.
Full outer join
inner join + left outer join + right outer join. Information from both tables combined.
Join several tabels togethers so that the result is sutible for ML
Union
stacks simmler columns on top of each other
no overlap but simply additonal data points
combine training and test data
ensures all of modifcations are shared
combines two data sets