Please enable JavaScript.
Coggle requires JavaScript to display documents.
Ch. 17 Evaluate Model Performance (ROC Curve (2x2 matrix aka confusion…
Ch. 17 Evaluate Model Performance
Introduction
two autoML criteria
understanding
the model's performance
the environment: a model's business context
learning
can change leader board metric in DataRobot
FVE Binomial equivalent to R square
LogLoss
A Sample Algorithm and Model
value in having general understanding of how algorithms work
Decision Tree Classifier
find most predictive feature at that instance and split it into two groups that are as homogenous as possible
Steps:
1) Find most predictive feature and place it at root of tree
2) Split feature into 2 groups at point of feature where 2 groups are as homogenous as possible
3) Repeat step 2 for each new feature (box)
ROC Curve
validation and cross validation scores should not differ wildly
density vs. frequency distribution
mountains on graph should not overlap too much
the threshold: the probability that DataRobot changes a prediction from negative to positive
shown with a score and a vertical line cutting off the two mountains
2x2 matrix aka confusion matrix
True Positive (TP)
True Negative (TN)
accuracy = (TP + TN) / all cases
what proportions of decisions made by model are correct?
False Positive (FP)
False Negative (FN)
Positive Predictive Value (PPV) = TP / (TP + FP)
how often model is correct when it indicates something positive
True Positive Rate (TPR) = TP / (TP + FN)
aka sensitivity
proportion of positive cases correctly identified
Negative Predictive Value (NPV) = TN / (TN + FN)
proportion of correct false predictions
True Negative Rate (TNR) = TN / (TN + FP)
False Positive Rate (FPR) = FP / (FP +TN)