Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 17: Evaluate Model Processes (Using the lift chart for business…
Chapter 17: Evaluate Model Processes
Introduction (17.1)
No way to know which algorithm is going to come out on top
But you should know at a technical and conceptual level
electric bike example
don't need to know how engine works, just how to ride a bike
need to understand model performance in business context
logloss
good at overall success but not reasonable to see how far from target predictions
other valuations make easier to see how far from target
FVE
fraction of variance explained binomial
how much of data explained
how much variance explained
same as an r2 value
Sample Algorithm and Model (17.2)
value in general understanding
decision tree classifier
finds most predictive feature @ instance and splits into 2 most homogenous groups
three reference models @ bottom
works through three steps
1.) find most predictive feature, place at root of the tree
2.) split into 2 mostly homogenous groups based off of next feature
3.) repeat step 2
series of If Then statements
yes or no questions which can be applied to data
aka patients who may be readmitted
ROC Curve (17.3)
model quality
very important
receiver operating chracteristics
other factors determine model success
ROC curve
cross validation should be similar to validation
dynamic because it operates at several thresholds
Area under the Curve (AUC)
straight up to left corner then to the right =target leakage
good AUC = high TPR and low FPR
prediction distribution
places "mountains" onto graph to see where specific cases fall
threshold
best cutoff to separate two outcomes
confusion matrix
false/true positives/negatives outcomes
determines accuracy and PPV
positive predictive value
tp/(tp+fp)
means correct x% of the time
true positive rate (tpr)
sensitivity
tp/(tp+fn)
calculate f1 by taking harmonic mean of two
2tp/(2tp+fp+fn)
large f1 does not mean its good
negative predictive value (npv)
correct when predicting false values
tn/(tn+fn)
true negative rate (tnr)
specificty
tn/(tn+fp)
false postivie rate
fp/(fp+tn)
matthews correlation coefficient
good indicator even if weighted in 1 direction
Using the lift chart for business decisions (17.4)
Sort all validation cases by probability of readmission
Use to see if campaign is worth it
Enable drill down and download actual predictions
Look at CSV
Get rid of holdout sample
Apply to campaign