Please enable JavaScript.
Coggle requires JavaScript to display documents.
TAE Week 3 - Coggle Diagram
TAE Week 3
Logistic Regression
for categorical variables, predict 1/0 in output
-
-
-
linear part of the model calculates the log-odds of a specific event
- log-odds = beta0 + beta1 x1 + beta2 x2 + … + betam * xm = log(p / (1 – p)
- odds of success = p / (1 – p) = exp(log-odds) = exp(beta0 + beta1 x1 + beta2 x2 + … + betam * xm)
- p = odds / (odds + 1) = exp(logodds) / 1 + exp(logodds)
p =
logodds interpretation
-
goodness of fit
null and residual deviance: residual deviance the lower the better. measures how well the response variable is predicted by the intercept and the predictor variables
-
confusion matrix: a matrix containing true positive, false positive, true negative and false negative. will change depending on threshold that you set for the model. depending on what you want to minimise, you set different thresholds
- TPR: true positive/true positive + false negative
- FNR: false negative / true positive + false negative
- TNR: true negative / true negative + false positive
- FPR: false positive / true negative + false positive
TPR is also known as sensitivity, TNR is also known as specificity, Accuracy: TN + TP / All the Observations
Receiver operating characteristic curve (ROC): plot TPR on y axis and FPR on x axis --> random straight line in the middle shows that TPR = FPR which will happen for a completely random model --> ideally you want to have a ROC that is above the middle line
- this curve is plotted as a function of threshold (so depending on which points you want to be at on the graph (how much false negatives vs false positives you can tolerate), you moderate your threshold accordingly)
a good classifier/predictive model will have high AUC --> area under the ROC curve --> close to 1 is the best (high TPR and low FPR)
ideally you want a threshold that can as close as possible perfectly separate y = 1 from y = 0, so that would mean a point on the graph that is as close to as FPR = 0 and TPR = 1
R Code
Logistic Regression
The predict() function can be used to predict the probability that the market will go up, given values of the predictors. The type="response" option tells R to output probabilities of the form P(Y = 1|X)
steps:
- glm()
- predict(model, data, type = response)
- ggplot() + ... geom_smooth(method="glm",se=F,na.rm=T,fullrange=T,method.args = list(family = "binomial"))
- create confusion matrix based on diff thresholds
- plot the ROC curve using the library(ROCR) package using the prediction/performance functions
- calculate the auc by using as.numeric(performance(ROCRpred,measure="auc")@y.values)