Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 9: Data Integration (unions (based on the assumption that there…
Chapter 9: Data Integration
joins
to access more features
combines two datasets with a shared identity
two types
inner
each row in the left table is combined horizontally
with any row in the right table that has the same identity value
outer
right outer
same result as an inner join
also adds any rows from the right table that do not have any corresponding rows in the left table
left outer
produces same result as an inner join
also adds any rows from the left table that do not have any corresponding rows in the right table
most useful for collect as much customer data as possible
full outer
combines all types of joins into one table
good if there are overlaps between rows and a shared identifier to allow for integration
unions
to access more observations
based on the assumption that there are multiple columns in common between A & C
lines up each column containing similar information on top of another
which creates a new table
Example
if a company has multiple customers and their records are stores in various databases, a union will create one table containing all customers for that company
combine two datasets
generally performed when datasets contain unique sets of cases sharing the same or very similar columns
combining the college themed data (course enrollment, tuition payments, etc.)
Always use a union if:
machine learning example
testing/training
imputation
the replacing of missing data in a column with the assignment of reasonable values
Join Example
Note to self: refer to annotated printed copy of book excerpt