Please enable JavaScript.

Coggle requires JavaScript to display documents.

Chapter 5: Overfitting and its Avoidance (Learning curves (for smaller…

- - - - lab test of generalization performance
        
        usually a difference between a models accuracy on the training set and on the generalization accuracy
      - often referred to as the test set
- - - - these do not represent characteristics of the population in general
        
        cross validation is a more sophisticated holdout trainmen and testing procedure
        
        makes better use of a limited data set
        
        splitting labeled datasets into partitions called FOLDS
        
        usually 5-10 folds
        
        compute average and standard deviation
        
        logistic regression vs classification trees
      - sourious: not being what it purports to be; false or fake.
- - - - tradeoffs between complexity and overfitting
      - beat way to test overfitting id with hold out data
        
        a fitting graph
        
        has 2 curves
        
        one for model performance
        
        one as a baseline
      - reining in model complexity to avoid overfitting