Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 12.1-12.2: Data Reduction and Splitting (12.1: Unique Rows…
Chapter 12.1-12.2: Data Reduction and Splitting
12.1: Unique Rows
Partial Match Removal
first must sort data in the order listing the rows TO KEEP first
ex: trying to find specific data from specific date on each phone
use unique function to single out first use of phone for every day (most likely their home location)
Now use "summarize" function, can be assumed that most common location is user's home
resulting table contains only device ID, longitude, latitude: can be labeled "home location"
other ex: dairy products
unique function looks at column Order ID, leaving only 1 dairy product per order, enabling easy count of number of dairy orders
Complete Match Removal
(just a special case of partial match)
12: Removing rows or splitting datasets into 2
12.2: Filtering
splitting set of data based on certain characteristic
filtering has uses outside of just test and training data
ex: there is a set of data transformations that apply only to a subset of data
data can then be filtered into two separate tables (containing same columns but different rows
ex: may be necessary during imputation of missing data
ex: age is missing for many passengers (Titanic example)
use filter to select only passengers w an age
then take avg of these ages
then enter this mean for all the empty guys