Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 5: Overfitting & It's Avoidance (Generalization (the…
Chapter 5: Overfitting & It's Avoidance
Generalization
the property of a model or modeling process, whereby the model applies to data that were not used to build the model
tailored to fit perfectly to training data
Overfitting
tendency of data mining procedures to tailor models to the training data, at the expense of generalization to previously unseen data points
Fitting Graph
shows the accuracy of a model as a function of complexity
Holdout data
data that will not be used to build the model
Estimate generalization performance by comparing predicted values to true values
Why is overfitting bad?
incapable of generalization
hinders us from improving a model after a certain complexity
From Holdout Evaluation to Cross-Validation
Holdout testing set will give us an estimate of generalization performance, but its just a single estimate
Cross Validation - more sophistacted
makes better use of limited data sets
unlike splitting data into one training and one holdout set, cross validation computes its estimates over all the data by performing multiple splits
Learning Curves
a plot of the generalization performance against the amount of training data
shows generalization performance
TRACY GIANG