Ch.5 Overfitting and its Avoidance (Overfitting Examined (Holdout Data and…
Ch.5 Overfitting and its Avoidance
Generalization is the property of a model or modeling process whereby the model applies to data that were not used to build the model.
Data mined that perfectly fit the training data, but take under consideration too many factors that reduce the quality of the model.
Holdout Data and Fitting Graphs
Fitting graph is a visualization to compare holdout with training data to see where the model begins to lose quality due to overfitting.
Overfitting in tree induction
Too many grener
Overfitting in mathematical functions
Reduce the importance of outliers
Example: Overfitting linear functions
Flowers with outliers. More generalized would have been better.
Example: Why is overfitting bad?
Decreased quality due to tailored training data. Increased error rate.
Sidebar: Building a modeling "laboratory"
From holdout evaluation to cross-validation
Better use of limited data set. Tests the validation and corrects the model when changing holdout data using the average
The churn data revisited
A plot og the generalization performance against the amount of training data
Overfitting avoidance and complexity control
Avoiding overfitting with tree induction
stop growing tree before it gets to complex or grow it very lage and then prune back
A general method for avoiding overfitting
Avoiding overfitting for paramenter optimization
Sidebar: Beware of "multiple comparisons"
Data mining is operating on the edge of getting the best model and not overfitting. Overfitting can be tested with graphs and/or calculations.,