Larsen Textbook (Chapter 17: Evaluate Model Performance (Fraction of…
Evaluate Model Performance
AutoML should improve a user’s understanding of the problem by providing visualization of the interactions between the features and the target.
presentations to management should never be about explaining how algorithms work - should be about performance characteristics
Logloss: overall optimization measure
Fraction of Variance Explained:
provides a sense of how much of the variance in the dataset has been explained and is equivalent to an R2-value.
metric states how far off, percent-wise, the model is from fully explaining who will be readmitted (to turn an R2-value into a percent, multiply it by 100)
Many tree-based algorithms build on the logic of the decision-tree classifier, which is to repeatedly find the most predictive feature at that instance and split it into two groups that are as internally homogeneous as possible. The many types of tree-based algorithms are often combinations of hundreds or thousands of decision trees
The decision tree classifier works through the following steps to create this tree:
Split the feature into two groups at the point of the feature where the two groups are as homogenous as possible
Repeat step 2 for each new branch (box)
Find the most predictive feature (the one that best explains the target) and place it at the root of the tree.
Comparing Model Pairs
a random model (one that has access to no predictive data) would produce a nearly straight line
To make the best possible decisions, it may be necessary to return to the business problem in Section II to generate ideas on how to solve the prob
With the perfect model, at any cutoff point, one finds only true positive cases, so the “curve” travels immediately along the left bound of the chart. The “curve” remains there until the model begins predicting negative cases, which, in an ROC chart will start to be predicted as positives as the probability distribution threshold moves to the left.
he random model displayed here confirms that the ROC chart random line does indeed extend from the bottom left to the upper right corner
When deciding which model to select, there are five criteria to consider.
Speed to build model.
Familiarity with model.