Please enable JavaScript.

Coggle requires JavaScript to display documents.

Chapter 5: Overfitting and Its Avoidance (Sidenotes (Accuracy of a model…

- - - - Memorizes training data and performs no generalizations
  - - - X-axis measures complexity of model
      - Y-axis measures the error
      - Generalization performance as well as performance on the training data, but plotted against model complexity
    - - Used to estimate generalization performance
      - Sometimes called "test set"
    - - Picked up idiosyncrasies of the data-set that do not represent the general population
      - Generalizations due to idiosyncrasies
  - - - This is also a problem with the data -> it becomes too overfitting
    - - Stop growing tree before it gets too complex
      - Grow tree until it is too complex, then prune the leaves, reducing size and complexity
  - - - Multiple splits and systematic swaps
        
        k partitions called folds
        
        Standard deviation may vary and an average of them may need to be attained (very simply)
        
        Typically 5 or 10 times
        
        Each iteration creates one model and one estimate of generalization perfromance
  - - - Plot of this correlation is called a learning curve