Please enable JavaScript.
Coggle requires JavaScript to display documents.
Larsen - Ch. 16: Understanding the Process (Key Concept/Terms (Regularized…
Larsen - Ch. 16: Understanding the Process
Learning Curves and Speed
Additional cases
Additional features
Key Concept/Terms
Regularized Logistic Regression
Elastic-Net
LogLoss
Diminishing Marginal Improvement: greater amount of relevant data at outset, the less likely additional data will improve predictability
Additional data
Things to consider
Calculate cost if that data is available before adding
Data can get "stale" over time
Use cross validation instead of validation as a sample (especially useful when there is a lot of data)
Accuracy Tradeoffs
Speed vs. Accuracy tab
Models
Advanced AVG blender - slowest model
Possible to set up multiple prediction servers to speed up predictions if model is accurate enough
Efficient frontier
Steps
Calculate speed of slowest model (assuming it is the most accurate)
Compare result with predictive needs
If slowest model produces results quicker than needed, ignore speed as model criterion and can use model
If time is a factor, follow efficient frontier to the left until most efficient model found with reasonable accuracy
Blueprints
Blueprint pane shows best model
Generate probability indicating target was positive, which becomes features used along with target as input into Elastic-Net classifier, generating predictions
Imputation
Model uses median value for feature
Create another feature to impute missing values, known as the "Indicator"
False if row not imputed, True if imputed
Standardization
Algorithms like "Support Vector Machines" and some linear models struggle with features with diff Standard Deviations
Features are
scaled
to set Mean to zero and std. dev. to "unit variance" (= 1)
One-hot Encoding
For any categorical feature that meets certain requirements, create new feature for category in old
Note: if original only has two values, ("yes" and "no") then become one new feature "True" and "False"
Min and Max specify how many unique values must be present for it to be eligible for One-hot encoding