Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 16: Understanding the Process (Accuracy Tradeoffs (This screen…
Chapter 16: Understanding the Process
Learning Curves and Speed
Before settling in with the selected models, it is important to understand whether
more data would improve the models’ predictive ability.
There are two forms of additional data: additional features and additional cases. It is wise to generally work under the rule of diminishing marginal improvement: the greater the amount of relevant data at the outset of a project, the less likely additional data will improve predictability
Accuracy Tradeoffs
This screen addresses an important question related to how rapidly the model will
evaluate new cases after being put into production. An “Efficient Frontier” line has
been added to illustrate which models are best.
The Efficient Frontier line is the line
drawn between the dots closest to the X- and Y-axes and usually occurs when two
criteria (speed and accuracy in this case) are negatively related to each other (the
180
best solution for one is usually not the best solution for the other).
Always look for the efficient frontier line. If time is a factor, the efficient frontier will
need to be followed to the left until the most efficient model is found that is still
reasonably accurate
Do note, however, that not all speed issues are unchangeable.
One can simply add more prediction servers many times as a method to increase
prediction speed.
Blueprints
The Blueprint pane shows the XGBoost model that did better
than all other models with the exception of the blender models built on top of this
XGBoost model and two to seven other models.
Imputation
This
186
indicator variable allows the algorithm to look for predictive value about which
patients are missing information for a given feature.
Standardization
Each feature is
therefore “scaled,” which means that the mean value of the feature is set to zero
and the standard deviation is set to “unit variance,” which is a fancy way to say 1.
One-Hot Encoding
the minimum number of records for
a category to be represented in one hot encoding
Hyperparameter
This is the step where AutoML provides one of
its single most important contributions to machine learning; hyperparameter
optimization
It is possible to calculate how many different parameter
combinations are available for fine-tuning an XGBoost model.