Chapters 17 and 18
Chapter 17
Chapter 18
decision tree algorithm
presentations to management should never be about explaining how algorithms work, but rather, about their performance characteristics
overall optimization measure (LogLoss)
ROC curve
does not constitute a reasonable measure of how far from the target predictions are at the level of the average case
more understandable manner: such as Fraction of Variance Explained (FVE) Binomial
The many types of tree-based algorithms are often combinations of hundreds or thousands of decision trees
steps
Find the most predictive feature and place it at the root of the tree
split the feature into two groups at the point of the feature where the two groups are as homogenous as possible
Repeat step 2 for each new branch (box)
receiver operating characteristics
accuracy
(TP+TN)/all cases
precision
TP/(TP+FP)
negative predictive value
TN/(TN+FP)
AUC: model quality
model comparison
overall best model (the ENET Blender, M101)
best non-blender model (XGBoost model M63)
selecting a model
4.Familiarity with model
5.Insights
3.Speed to build model
2.Prediction speed
1.Predictive accuracy