Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter5:Overfitting and Its Avoidance (Generalization (Overfitting (Tree…
Chapter5:Overfitting and Its Avoidance
Generalization
A model applied to data that was not used to build the model
Overfits the training data, overfits
Overfitting
tendency to fit the procedures to tailor models to the training data at the expense of generalization to previously unseen data points
Look for patterns, not specific matchups
Tree Induction
Procedure that grows leaves until pure tends to overfit
typically around 100 nodes
Restrict tree size
Why is overfitting bad?
Picks up harmful spurious correlations
improves as more testing is done
Learning Curve
Plot of the generalization performance against the amount of training data
Steep than pleateaus
Logistics regression performs better smaller sets
Tree induction can represent substantially nonlinear relations ships between features and targets with Learning Curve
Holdout data (test set)
Seperate data with know target variable from test data
Fitting graph test holdout data set
Cross Validation
Uses statistical measurements to understand how performance is expected to vary across multiple datasets
Estimates over all data by performing multiple splits and systemeatically swapping out samples for testing
Avoiding Overfitting
Tree Induction
Problem: It will keep growing tree to fit data
Solution: 1. Stop growing tree before it gets too complex
limit instances present in a leaf
Use p-value (<.05) to keep going
Solution: 2. Grow tree untill too large, than prune it back, reducing size
Nested Holdout Testing
sub training set
Validation set
take training set and split it again. Nothing is special about our chosen original training set
Nested Cross-Validation
Summary
Trade-off of model complexity and overfitting
Overfit model will not generalize to other data well, even if they are from the same population
Recognize overfitting with a holdout set
Model regularization include tree pruning, feature selection, and objective functions