Please enable JavaScript.

Coggle requires JavaScript to display documents.

Chapter 16: Understanding the Process (blueprints (16.3) (can't see…

- - - - additonal features
      - additional cases
    - - greater relevant data @ outset means you are less likely to need more
    - - y-axis
        
        validation scores (smaller is better)
      - x-axis
        
        % of data needed
      - how did the model do with x amount of data exposed to it?
      - predictability is generally better with more data
      - cross validation is more relevant with more data
- - - - negatively correlated
    - - example... website users to fast, you need 2 models
- - - - example: regularized logistic regression
        
        imputation
        
        median is used for this
        
        justification given
        
        which were imputed given
        
        standardization
        
        mean is set to zero
        
        standard deviation is unit variance (aka 1)
        
        one-hot encoding
        
        switch categorical into numerical based off of unique values
        
        min_card/min_max
        
        how many unique values you need