Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 5: Mind Map (Fitting Graph (Generalizaiton Performance (Comparing…
Chapter 5: Mind Map
Fitting Graph
Based on accuracy as a model of complexity
Generalizaiton Performance
Comparing predicted values w/hidden true values
Based on how complex you allow the model to be
Churn Data-set
Must mis-trust data on a training set
Overfitting
Tendency to make models with training data
At the expense of Generalization
All data models could and do, do this
Recognize and manage in the principle way
Increases when you allow more flexibility
Why is it bad?
Model will pick up harmful correlations
all models are susceptible to over-fitting effects
Cross-validation:
estimated performance
estimates all data
More sophisticated
Avoidance
Tree induction
Stop growing the tree
Grow until it is too large hen prune it back
Estimate the generalizing performance of each model
Parameter optimization
Find the right balance
Equations
Over-fitting in Tree Induction
Sectioning to get "pure" data
Growing trees until the leaves are pure: how to over-fit
Number of nodes = complexity of the tree
Measure accuracy on training and test set
If not pure: estimate based on average
Sweet spot: where it starts to over-fit
Mathematical Functions
Adding more xi's is more complex
wi = a learned parameter
Better fit = more attributes
Over-fitting and it's Avoidance
Allows for flexibility when searching data
Patterns that do not generalize: over-fitting
Generalization
Sampling approach = table model
memorizes training data and doesn't generalize
For previously unseen data
If fails: more realistic models will fail too
Not fit with other data: over-fit
Linear
Classes are distinct and separable
Iris model from other chapter