Please enable JavaScript.
Coggle requires JavaScript to display documents.
Understanding the Process (Blueprints (Hyperparemeter optimization…
Understanding the Process
Learning curves and speed
Before settling with selected models
Would more data help predictive ability?
Two types of additional data
Additional cases
Additional features
Diminishing marginal improvement
More relevant data at start of project, less likely additional data will improve predictability
Consider cost
If data is from earlier date then what is currently owned it might have become stale over time
Use cross validation instead of validation when considering
Leaderboard tab
Learning curves
Shows validation scores on Y-axis and percent of available data used as the x-axis
Models
Speed vs Accuracy
How rapidly the model will evaluate new cases after being put in to production
Efficiency frontier
Illustrates what models are the best
Line drawn between the dots closest to X and Y
Usually occur when things like speed and accuracy are negatively related to eachother
Shows the top ten models by validation score
Blueprints
Imputation
Standardization
One hot encoding
Hyperparemeter optimization
Advanced content