Please enable JavaScript.
Coggle requires JavaScript to display documents.
Data integration (Process of data integrating with method of two…
Data integration
Process of data integrating with method of two introductions:
Join
Join – to access more features
Join combines two datasets with shared identity value, such as customer identifier. Might join row containing your customer record with login info.
After a join, one or more new rows are created, each containing customer info from A on the left and website behavior from B on the right.
Tables- a table can be any rectangular matrix containing data in column form where each column begins with a descriptive name of the data in that column
Two types of join that are most relevant for our purposes
Inner join
When dealing with carefully curated databases, we will use inner join most often! Such databases will generally be designed to not accept an order unless a customer account exists for that order (something called “referential integrity” forces the datavase and any system using it to adhere to rules of conduct like this).
(each row in the left table is combined horizontally with any row in the right table that has the same identity value).
Outer join
o Left outer join (produces the same result as an inner join, but also adds any rows from the left table that do not have corresponding rows in the right table). Below the rows outlined in the above cell, we will also get rows containing customers who never placed an order)
♣ If we are engaged in a project with the goal being to collect as much customer data as possible, then the left outer join is most useful.
o Right outer join (join produces same result as an inner join, but also adds any rows from the right table that do not have any corresponding rows in the left table)
o Full outer join (join produces the same result as an inner join + a left outer join + a right outer join) this means that any customer without orders is added to the resulting table, and any order not connected to a customer is also added to the resulting table.
There is also a Cartesian join, which will combine every row in one table with every row in another table. For example, joining two tables with 100 and 1,000 rows will lead to 100,000 rows.
Union – to access more observations
Based on the assumption that there are multiple columns in common between A and C. A union lines up each column containing similar information on top of another (zip code existing in A and C, for example)
Unions allow us to combine two datasets, shown as top (T) and bottom (B).
We perform unions when we have datasets that contain unique sets of cases sharing the same or very similar columns.