Please enable JavaScript.
Coggle requires JavaScript to display documents.
Understanding the Process (Learning curves and speed (validation scores on…
Understanding the Process
Learning curves and speed
validation scores on the Y-axis and the percent of the available data used as the X-axis
lower scores
are preferable
Apply to reality
calculate cost
data gets stale over time
Accuracy tradeoffs
This screen addresses an important question related to how rapidly the model will
evaluate new cases after being put into production
start by calculating the speed of the slowest model
Compare this result then with the reality of the predictive
needs required by the given project at hand
If the slowest model produces results
more rapidly than needed, ignore speed as a criterion in model creation.
Blueprints
Each of
the models seen prior employs a different set of pre-processing steps unique to that
type of model.
Imputation
indicator feature will simply contain a False if that row was
not imputed and a True if a given row contains a value that was imputed
Standardization
click on the Standardize box to see that after imputing missing values, the
numeric features are all standardized
One hot encoding
for any categorical feature that fulfills certain requirements, a new feature is
created for every category that exists within the original feature.