Chapter 9: Data Integration (Integrating Data: (Unions (to access more…
Chapter 9: Data Integration
to access more features
Combines 2 datasets with a shared identity value
to access more observations
based on the assumption that there are multiple columns in common between A and C
lines up each columns containing similar info on top of another, creating a new table
Each row in the left table is combined horizontally (see Figure 9.2) with any row in the right table that has the same identity value. For example, if we join based on the identity CustomerID, which both the Customers and the Orderstables contain, the customer information and order information for each customer is combined into a third table, containing repeated information on the customer for each order they have made
Left outer join
s join produces the same result as an inner join but also adds any rows from the left table that do not have corresponding rows in the right table. Below the rows outlined in the above cell, we will also get rows containing customers who never placed an order. For such customers, the order-related columns will contain NULLs (a sign that there is nothing there). Left outer joins might in this case be used if our goal is to find out why some customers place orders and some do not. An inner join would exclude all the customers that did not place an order.
Right Outer Join
is join produces the same result as an inner join, but also adds any rows from the right table that donot have corresponding rows in the left table. In the customer and order case, this type of join might be used to combine orders or near-orders, such as the common problem of abandoned shopping carts. Such shopping carts will be filled by an anonymous user and registered in our system as an incomplete order when the potential customer leaves the site without registering as a customer and paying for the products. This kind of data would be critical if we suspected that the specific products themselves led to abandonment of shopping carts.
Full outer join
This join produces the same result as an inner join + a left outer join + a right outer join. This means that any customer without orders is added to the resulting table, and any order not connected to a customer is also added to the resulting table
rform unions when we have datasets that contain unique sets of cases sharing the same or very similar columns.