Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 12: Data Reduction and Splliting (Sampling (select small data sets…
Chapter 12: Data Reduction and Splliting
Data Reduction
Remove duplicate rows
Partial Match Removal
based on some identical data
sort in order of rows to be kept
specifiy which columns should be removed
Complete Match removal
Identical content in all columns
Special type of partial match removal
Does not matter which row is removed
Filtering
Split data into seprate tables
based on characteristics
same colmns but different rows
Used for imputed data/ test and train
Sampling
select small data sets that mirror population
datasets to build models
datasets to evaluate models
ML findings are generalizable
ML findings are capable predicting the future
Preservation of processing power and anyalsis time
downsampling
unbalanced datasets
random sampling
Holdback sample is first set extracted data
validation
cross valitadtion