Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 16. Understanding the Process (16.1 Learning Curves and Speed…
Chapter 16. Understanding the Process
16.1 Learning Curves and Speed
Before settling in with the selected models, it is important to understand whether more data would improve the models’ predictive ability.
There are two forms of additional data: additional features and additional cases. It is wise to generally work under the rule of diminishing marginal improvement: the greater the amount of relevant data at the outset of a project, the less likely additional data will improve predictability.
Now go to the menu option next to “Leaderboard” and select Learning Curves
This screen shows the validation scores on the Y-axis and the percent of the available data used as the X-axis. Remember that, in this case, on the Y-axis, lower scores are preferable because LogLoss is a ‘loss’ measure (every mistake in prediction increases the ‘loss score’).
Soon these models will be examined further to understand
performance metrics, but for now, the performance of the Elastic-Net model can be addressed more simply at the three data levels (16%, 32%, 64%).
16.2 Accuracy Tradeoffs
Select now the Speed vs. Accuracy tab in the sub-menu under Models
This screen addresses an important question related to how rapidly the model will evaluate new cases after being put into production. An “Efficient Frontier” line has been added to illustrate which models are best. The Efficient Frontier line is the line
drawn between the dots closest to the X- and Y-axes and usually occurs when two criteria (speed and accuracy in this case) are negatively related to each other (the 180 best solution for one is usually not the best solution for the other). This line won’t show on your screen because it has been manually added for this book to illustrate which models to pay attention to when learning and when using DataRobot in real world application
A model must be capable of producing predictions as rapidly as
new cases are arriving. To evaluate the best models, the speed vs. accuracy screen shows the top 10 models by validation score. The Y-axis, as in the case of the learning curves explanation, displays the LogLoss score (the optimization measure selected), and the X-axis shows the time to score 2,000 records in milliseconds
In this case, the slowest model (furthest to the right on the bottom), the Advanced AVG blender, requires the scoring by eight other models first before computing the average result, so it should come as no surprise that it is a bit slower than the individual models.
As with above, start by calculating the speed of the slowest model (assuming, as is often the case, that it is also the most accurate model) by converting the numbers into cases per second.
Compare this result then with the reality of the predictive needs required by the given project at hand. If the slowest model produces results more rapidly than needed, ignore speed as a criterion in model creation. Speed must be evaluated against peak-time prediction needs. If fast responses are not needed, it may be ok if the model is not able to keep up with peaks in prediction need.
Always look for the efficient frontier line. If time is a factor, the efficient frontier will need to be followed to the left until the most efficient model is found that is still reasonably accurate.
16.3 Blueprints
After seeing the model creation and scoring process, the model blueprints addressed at the start of this chapter can now be more easily understood. Each of the models seen prior employs a different set of pre-processing steps unique to that type of model.
To consider the blueprints, start by clicking on the name of the XGBoost model currently ranked as #5 in the leaderboard. This will lead to the screen shown below in Figure 16.4. As with the features’ views, it is only possible to keep one of these model views open at a time.
Right under the blue line stating the name of the algorithm used to create this particular model is a sub-menu starting with
“Blueprint” ( ). The Blueprint pane shows the XGBoost model that did better than all other models with the exception of the blender models built on top of this XGBoost model and two to seven other models.
It is fair to assume that for missing values, it imputes them
either by replacing them with an average for the column, or as the DataRobot documentation suggests, assigning an arbitrarily large negative number to all missing values (for example, -9999) as tree-based algorithms tend to do better with this information.
DataRobot may also either “one-hot-encode” categorical
features (as discussed in Chapter 10.1) or convert categorical features to ordinal features (ordered categoricals).
Converting categorical features takes less processing than one-hot encoding and tends to perform as well or better. The process also most likely uses the most important information from the three text features (diag_1_desc, diag_2_desc, and diag_3_desc) after the texts have been processed by the Auto-Tuned Word N Gram Text Modeler.
16.4 Hyperparameter Optimization (advanced content)
As the blueprint for the XGBoost model is examined in Figure 16.4, notice that the final step before applying the model to the validation sample is the use of the algorithm itself to create the model. This is the step where AutoML provides one of its single most important contributions to machine learning; hyperparameter optimization
Click on Advanced Tuning ( ) for the XGBoost model to see all 16 of the different parameters available for this XGBoost algorithm (Figure 16.18 contains a few of the parameters). It is possible to calculate how many different parameter combinations are available for fine-tuning an XGBoost model.
This chapter covered the automated process for building candidate models and how they are assessed against one another.