Please enable JavaScript.
Coggle requires JavaScript to display documents.
Overfitting and Its Avoidance (Chapter 5 (Overfitting (How to eliminate?…
Overfitting and Its Avoidance
Chapter 5
Generalization
Different data
Beyond training data
Example?
Sample of general population
Overfitting
Tailored models
Only training data
Expense of generalization
Example?
Table model
How to eliminate?
No single choice
Manage complexity
Cross-validation
Sophisticated holdout evaluation
Better use of limited dataset
Splits data into folds
Iterates "k" times
Performance estimates
Looks at varied performance
Learning Curves
Generalization performance vs. amount of training data
Steep initially
Then decreases when it can no longer improve accuracy
Analytical tool
Tree Induction
Limit tree size
Keeps low complexity
Specify number of instances
Use hypothesis test
"Prune" it back
Reduces complexity
Estimate accuracy improvements
Main Goal
Control Complexity
Model Regularization
General method
Controls complexity
How to recognize?
Fitting graphs
Accuracy as function of complexity
Acquire holdout data
Estimate generalization performance
Accuracy depends on complexity
In Tree Induction
Perfectly accurate
Tends to overfit when
Leaves are pure
Pure subsets
Total classification
Felxible
Measures 2 values
Training set accuracy
Holdout set accuracy
In Math Functions
Increase in dimensionality
Possibilities of better fit
With arbitrary datapoints
Example?
Automatic feature selection
More attributes
"Fundamental trade-off between model complexity and the possibility of overfitting"