Deloitte’s Data Analytics
Model Perfomance
Model Development
Model Refinements
Model Interpretation
Logistic regression
Training versus generalisation error
Reject inference
Variable selection
Classification
Information Value
Distinguish the “good” applicants
This implies that the model is not truly representative
Successful and transparent ways to do the required binary classification to “good” and “bad”
The data will be split into two parts
Requires a critical view and understanding on the variables and a selection of the most significant ones
Based on the idea that we perform a univariate analysis
The statistical models is required to find the separating line distinguishing the two categories
The first part will be used for extracting the correct coefficients by minimising the error between model output and observed output
The second part is used for testing the “generalisation” ability of the model
Measure of how significant is the discriminatory power of a variable
Predictive Power
Confusion Matrix
Goodness of Fit
Ability to generalise the rules it has learned from the training data set to a new one
Additional measure of predictive power
Contained one response variable and only one explanatory variable
Graph of two histograms
Absence of interactions among explanatory variables
Linearity in the explanatory variables
Inside the exponentials there are no higher-order terms
No terms mixing the variables
x is a Boolean variable
Provide further guidance by giving the impact of each individual explanatory variable
One explanatory variable
Obtain the two equations