CH5 Overfitting and Its Avoidance (Fitting and overfitting (Overfitting in…
CH5 Overfitting and Its Avoidance
table model:It memorizes the training data and performs no generalization.
"fit" or "overfit"
Fitting and overfitting
:endency of data mining procedures to tailor models to the training data
A corresponding baseline for a regression model is a simple model that always predicts the mean or median value of the target variable.
Overfitting in Tree Induction
Overfitting in Mathematical Functions
Example: Overfitting Linear Functions:
a linear model as described in Equation 4-2:
f( )=w0 +w1x1 +w2x2 +w3x3
the ratio of x2 and x3 is important, so we add a new attribute x5 = x2/x3. Now we’re trying to find the parameters (weights) of:
f( )=w0 +w1x1 +w2x2 +w3x3 +w4x4 +w5x5
modelers carefully prune the attributes in order to avoid overfitting
Example: Overfitting Linear Functions
Cross-validation; Attribute selection; Tree pruning; Regulariza‐
a more sophisticated holdout training and testing procedure
makes better use of a limited dataset
Sidebar: Building a modeling “laboratory”
A learning curve shows the generalization performance—the per‐ formance only on testing data, plotted against the amount of training data used
control the complexity of the models induced from the data
using the subtraining/ validation split to pick the best complexity without tainting the test set, and building a model of this best complexity on the entire training set (subtraining plus validation)
Avoiding Overfitting for Parameter Optimization
complexity control: finding the “right” balance between the fit to the data and the complexity of the model.