Please enable JavaScript.
Coggle requires JavaScript to display documents.
Data Reduction and Splitting (Filtering (Union Training and Test data…
Data Reduction and Splitting
Unique Rows
Data contains duplicate rows
Duplicates need to be removed
2 Ways to remove Duplicates:
Complete Match Removal
Removal of full rows based on identical content in all columns
Special case of partial match removal
Partial Match Removal
Removal of full rows based on identical content of a few columns
Data sorted in order to keep first
Specification of which columns should be identical for duplicates to be removed
Unique Function
function that keeps only unique rows
Discarding any row containing data already encountered
Summarize Function
Group unique configurations into discrete buckets
Filtering
Splits up a set of data into 2 tables based on characteristics
Union Training and Test data
Make sure changes are made uniformly to training and text rows
Non-uniform modifications will harm model's predictive ability
Once data modifications are concluded, training and test rows need to be separated
Data transformations applying only to a subset of available data
Data can filtered in 2 separate tables
Containing the same columns but different rows
During imputation of missing data
Union Example:
Union to combine both training and data into a new training set
A filter can be used after