Please enable JavaScript.

Coggle requires JavaScript to display documents.

Overfitting and Its Avoidance (Overfitting (Tree Induction (The tree…

- - - - Must holdout data for which you know the value of the target variable but will not be used in creating the model
      - Not actual use data
        
        Use data means you would like to predict the value of the target variable
      - Lab test of generalization performance
      - Hide values of this data from the model and maybe also the modelers
        
        Then model will predict the values
        
        Then we estimate generalization performance
        
        Comparing predicted values with true values
      - "test set"
      - accuracy on training set is called "in sample" accuracy
    - - Accuracy of a model as a function of its complexity
        
        Shows the difference between a modeling procedure's accuracy on the training data and the accuracy on the holdout data as model complexity changes
      - Generally more overfitting as model becomes more complex
        
        Chance of overfitting increases as one allows the modeling procedure more flexibility in the models it can produce
      - With each new row in the training set, error decreases