Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 16: Understanding the Process (16.1 Learning Curves and Speed (two…
Chapter 16: Understanding the Process
16.1 Learning Curves and Speed
rule of diminishing marginal improvement
the greater the amount of relevant data at the outset of a project, the less likely additional data will improve predictability
Learning Curves: shows the validation scores on the Y-axis and the percent of the available data used as the X-axis
two forms of additional data
additional features and
additional cases
The dots in Figure 16.1 represent how a model performed with different amounts of data available to it
whether more data would improve the models’ predictive ability
16.4 Accuracy Tradeoffs
the Speed vs. Accuracy tab in the sub-menu under Models
the line drawn between the dots closest to the X- and Y-axes and usually occurs when two criteria (speed and accuracy in this case) are negatively related to each other (the best solution for one is usually not the best solution for the other)
how rapidly the model will evaluate new cases after being put into production
16.5 Blueprints
Introduction
the model blueprints addressed at the start of this chapter can now be more easily understood
It is fair to assume that for missing values, it imputes them either by replacing them with an average for the column
16.5.1 Imputation
categorical features are one-hot encoded, and numerical features have their missing values imputed
uses the median value for the feature
16.5.2 Standardization
Each feature is therefore “scaled,” which means that the mean value of the feature is set to zero and the standard deviation is set to “unit variance,” which is a fancy way to say 1
16.5.3 One-hot Encoding
for any categorical feature that fulfills certain requirements, a new feature is created for every category that exists within the original feature
16.6 Hyper-parameter Optimization (advanced content)
This is the step where AutoML provides one of its single most important contributions to machine learning; hyperparameter optimization
possible to calculate how many different parameter combinations are available for fine-tuning an XGBoost model
Introduction
the focus is on whether or not more data is needed to increase accuracy of models produced in the previous chapter