Please enable JavaScript.
Coggle requires JavaScript to display documents.
Data Reduction/Splitting (Filtering (Important to union training and test…
Data Reduction/Splitting
-
-
Unique Rows
It's not uncommon for data to contain duplicate rows. When this happens, duplicates should be removed.
-
Filtering
often necessary and convenience tool for splitting up a set of data into 2 separate tables based on characteristic.
Important to union training and test data in order to make sure that changes are made uniformly to training and test rows.
-non uniform modifications to test and training data sets will harm a model’s predictive ability when being tested.
Once all data modifications – called fiber engineering – are concluded, the training and test rows must be separated again.
Only the training set can be used to train machine learning models. Often as simple as moving rows with a certain value in the target column into the train set and those without it into the test set.
-