Please enable JavaScript.

Coggle requires JavaScript to display documents.

Ch. 5 (Overfitting (Avoid overfitting (control complexity of models (tree…

- - - - shows accuracy of model as function of complexity
      - complexity of model v. error
    - - not for building the model
      - "test set"
  - - - = overfit because too complex
        
        too many nodes
        
        find sweet spot
        
        where test and training data follow same path
  - - - leads to perfectly fitting models
        
        must careful choose attributes to avoid overfitting
    - - Support Vector Machine
  - - - tree induction
        
        keeps growing tree to fit model
        
        hence overfitting
        
        Avoid overfitting
        
        stop growing tree
        
        create min. # of instances
        
        grow until too complex then "prune" it
        
        ensure not to reduce accuracy while pruning
    - - nested holdout
        
        using the entire dataset
    - - optimize some combo of fit and simplicity
        
        called regularization
- - - - performance of test data against amount of training data
  - - - more training = less steep curve
  - - - large data size
        
        tree induction more accurate
      - smaller data size
        
        logistic regression more accurate
        
        (not always)
        
        less flexibility
        
        overfit less
        
        tree induction will overfit more
- - - - better form of holdout
        
        estimates over all the data
        
        iterates through data
        
        "Folds" made from original dataset
        
        (k-1)/k for training
        
        1/k for testing
        
        Each iteration = model
    - - understand variance across data sets
        
        assess confidence in set
    - - tells average model behavior