Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 12: Data Reduction and Splitting (12.2 Filtering (Split up data…
Chapter 12: Data Reduction and Splitting
12.2 Filtering
Split up data set into two separate tables based on characteristics of data
ex: splitting products w/ dif weight units into dif tables
Useful for test & train files
Non-uniform modifications will harm model's predictive ability
Useful for applying data transformations to a subset of data
ex: titanic data set w/ passengers with and missing ages
Info about passengers w/ ages can be used to predict those with missing ages
filter used to put passengers w/ age into training set
12.1 Unique Rows
Remove duplicate rows
Partial match removal
Remove full rows because of a few identical columns
ex: app using device ID and locations to find user's home address
Steps
sort rows in order listing rows to
keep
Specify which columns should be identical to remove duplicates
Complete match removal
Remove full rows based on identical content in all columns
special case of partial match removal