Please enable JavaScript.

Coggle requires JavaScript to display documents.

Data Reduction and Splitting (Sampling (Holdback sample (First set of data…

- - - - Removal based upon duplicates in some columns
    - - Removal based upon duplicates in all columns
    - - Unique function
        
        Function that only keeps only unique rows, discarding rows already encountered
      - Summarize function
        
        Each unique id can be grouped into buckets, make it easy to count number of occurences
- - - - Set aside fold 1, create model combining data used in folds 2-5, use fold 1 for validation
      - Set aside fold 2, create a model with 1,3,4,5 and use 2 for validation
      - Calculate accuracy for each fold, and average the accuracies to have an overall accuracy
      - Re-structure, clean the data to attempt to train a better model