Please enable JavaScript.

Coggle requires JavaScript to display documents.

Chapter 12 - Data Reduction & Splitting (12.3: Sampling (data gathered…

- - - - data is sorted listing rows to keep first and then listing which columns should be removed and are duplicates
- - - - will be used for final evaluation of model produced
      - usually 20% of data
      - evaluates models performance
      - this is RANDOMIZED
    - - right # of folds balancing processing power needed for cross-validation
    - - then given access to data in fold 1 to compare predicted vs. target value
        
        if correct in 3 out of 4 cases for predicting zeros and 1 values accuracy is .75 (validations score)
    - - combine folds 1, 3, 4, 5
        
        right 2/4 times - accuracy of .50
      - against fold 3 - right 1/4 times - accuracy of .25
      - against fold 4 - right 3.4 times - accuracy of .75
      - against fold 5 - right 1/4 times -accuracy of .25
      - Add 3+2+1+3+1= 10
        
        10/20 correct total - accuracy of .50
        
        random assignment would give .50 accuracy so model is not better than blind guess!