Please enable JavaScript.

Coggle requires JavaScript to display documents.

Chapter 5: Overfitting and Its Avoidance (Overfitting (Overfitting…

- - - - Property of a model or modeling process, the model applies to data that were not used to build the model
- - - - Examine it?
        
        Holdout Data: Not used to build the model
        
        Generalization performance
  - - - A model gets more complex it is allowed to pick up harmful spurious correlations
        
        These spurious correlations produce INCORRECT generalizations in the model, causes performance to decline.
- - - - Begins by splitting a labeled dataset into k partitions called FOLDS
- - - - Next Step: Analyze the average of the folds with classification trees
        
        Then, compare the fold accuracies between logistical regression and classification trees
- - - - (i) To stop growing the tree before it gets too complex
      - (II) To grow the tree until it is too large , then "prune" it back, reducing its size (and therefore its complexity)
  - - - Building models on training subset and pick the best model based on this testing subset. former called the subtraining set and the latter the validation set
      - Validation Set: separate from the final test set, on which we are never going to make any modeling decisions