Deloitte’s Data Analytics

Model Perfomance

Model Development

Model Refinements

Model Interpretation

Logistic regression

Training versus generalisation error

Reject inference

Variable selection

Classification

Information Value

Distinguish the “good” applicants

This implies that the model is not truly representative

Successful and transparent ways to do the required binary classification to “good” and “bad”

The data will be split into two parts

Requires a critical view and understanding on the variables and a selection of the most significant ones

Based on the idea that we perform a univariate analysis

The statistical models is required to find the separating line distinguishing the two categories

The first part will be used for extracting the correct coefficients by minimising the error between model output and observed output

The second part is used for testing the “generalisation” ability of the model

Measure of how significant is the discriminatory power of a variable

Predictive Power

Confusion Matrix


Goodness of Fit

Ability to generalise the rules it has learned from the training data set to a new one

Additional measure of predictive power

Contained one response variable and only one explanatory variable

Graph of two histograms

Absence of interactions among explanatory variables

Linearity in the explanatory variables

Inside the exponentials there are no higher-order terms

No terms mixing the variables

x is a Boolean variable

Provide further guidance by giving the impact of each individual explanatory variable

One explanatory variable

Obtain the two equations