Please enable JavaScript.
Coggle requires JavaScript to display documents.
Ch. 12 Data Reduction and Splitting (Filtering (splitting data into 2…
Ch. 12 Data Reduction and Splitting
Unique Rows
Duplicate rows should be removed
Partial Match Removal
remove full rows -> identical content in a few columns
rows to keep listed first
which columns should be identical for duplicates to be removed
(unique function)
summarize function
Complete Match Removal
remove full rows -> identical content in all columns
special case or partial match removal
all values in all columns must match the same value in a prior row
Filtering
splitting data into 2 separate tables
train files
test files
data modification - feature engineering
data transformations that apply to subset of available data
2 sep tables - same columns but different rows
Jimmy Frainey
jafr4672@colorado.edu