Please enable JavaScript.
Coggle requires JavaScript to display documents.
Data Reduction and Splitting (Filtering (Splitting up a set of data into…
Data Reduction and Splitting
Unique rows
Data contains duplicate rows
Need to remove
Partial match removal
Removal of full rows based on identical content of a few columns
Sort data, list rows want to keep first
Followed by which columns should be identical for duplicates to be removed
Complete match removal
Removal of full columns based on identical contents in all columns
Special case of partial match removal
Unique function
Keeps only unique rows, discarding any row containing data already encounterd
Summarize Function
Each unique row can be grouped with unique configuration into discreet buckets
Filtering
Splitting up a set of data into two separate tables
Based on characteristics of table
Data modifications (Feature engineering)
After concluded, training and test rows must be separated again
Use outside of test and train files
Imputation of missing data