Please enable JavaScript.
Coggle requires JavaScript to display documents.
Step 5 : Model Evaluation (PREDICT (Tree predict Predict(tree ,…
Step 5 : Model Evaluation
PREDICT
Tree predict
Predict(tree , type="class" , data=data.train)
GLM predict
Predict(glm , type="class" , data=data.train)
GENERAL PREDICT CODE
predict(model, type, data)
Model
= Tree or GLM
Type
= "class" (0 or 1), "response" (predicted probabilities and target var default gives scale of linear predictors), "terms" (matrix with constant for all variables)
ROC CODE
ROC CODE
roc(data.train$target, predictor)
Builds the ROC curve
Predictor
= predict(model, type= "prob", data=train)
Predictor for ROC must be "prob"
Defintion ROC
Plot of TPR(true positive) and FPR (false positive)
A good fit is a line over diagonal line (0,0) to (1,1)
Estimate of model fit is AUC
CONFUSION MATRIX
Target = BINOMIAL
ConfusionMatrix(data=predictions, factor(data.train$value_flag))
Target = CONTINUOUS
Measures
AIC
(Akaike Information Criterion)
Definition
: assesses qulity of model through comparison to related models based on deviance, but penalized for making model complicated. ALWAYS use in comparison to another model. The smaller, the better. [-2loglikelihood + 2k]
CODE
AIC (GLM model)
Deviance
Definition
: meaure of goodness of GLM (similar to SSE) "Null Deviance" is the measure when target is predicted using sample mean. "Residual Deviance" includes predictors.
CODE
: drop1 ( dat.train , test ="LRT")
OUTPUT
: [ df, deviance, AIC, Likelihood Ratio test "LRT", P-value]
RMSE Code
sqrt [ sum[ (data$Target_value - predict)^2 ] / nrow(data)) ]
AUC CODE
Definition AUC
AUC code
pROC::auc(roc)
Residuals
Residuals VS Fitted
If all vertical bars are centered near zero and spread symmetrically in each direction,
indicating constant variance and near zero mean for residuals, then good model
Normal QQ
checks the normality of the standardized
deviance residuals
A good model follows the diagonal line
one that deviates at the end needs a fatter-tailed model