Please enable JavaScript.
Coggle requires JavaScript to display documents.
Data Reduction and Splitting - 12.1 & 12.2 (Filtering (Northwind…
Data Reduction and Splitting - 12.1 & 12.2
Obtaining unique rows
datasets can contain duplicate rows
two options for duplicate row removal
partial match removal
identical content in a few columns
removal of full rows
conducting partial match
first sort data, "keep" rows first
specify identical columns for duplicate row removal
complete match removal
identical content in all columns
removal of full rows
Home address example
business looking for customer home address
first filter data by device ID
then filter by date and time
apply unique function that:
keeps first row from every day
discards all non-unique data
use summarize function to:
"group" unique id/latitude and longitude
more accurate w more data
Filtering
splitting a set of data into two tables
same columns, different rows
when data transformations apply to only some data
helpful for imputation of missing data
Northwind dataset
filter by weight specification
gain understanding of packaging/content sold
easier to see when "kg" and "oz" are in diff. tables
more similar data in each table