Communicate Model Insights (Unlocking the Holdout (small data sets can…
Communicate Model Insights
It is important to distinguish between information useful for understanding the model and information useful to an audience for making business decisions.
Six types of information should be communicated
Areas where a model struggles (potential for improvement through more data––features & cases)
Most predictive features for model building
Model quality metrics (confusion matrix)
Feature types especially interesting to management (e.g., insights into the business problem and unknowns uncovered during the modeling process)
Recommended business actions (i.e., to implement model or not, any business decisions to implement at various probability thresholds, and how will doing so change practice?)
How does regression, or predictive analytics rather, work?
All the algorithms being worked with function conceptually in the same way:
they determine the generalizable relationship between the features (or independent variables, if your audience is used to statistics) and the target (or dependent variable) and place those relationships into a model that can be used to both understand those relationships and predict the outcome of cases not yet encountered.
Unlocking the Holdout
small data sets can sometimes lead in the wrong direction
evaluate the model based on data it hasn't seen yet
If the top models’ holdout sample scores are substantially lower than the cross validation sample, then there is reason to be concerned about the modeling process.
Best outcome you want no change between models at the top between cross validation and holdout
Model Quality Metrics
Present data or insights about the model with annotations to make it clearer
Procuring data, cleaning it, and carefully addressing issues
focus on what a measure means rather than its name
Revisiting internal and external data to improve the model
Not All Features Are Created Equally
Four kinds of features to consider when presenting
Features requiring further examination: involve SME
Immutable features: These are features that are good for modeling but are of no value to management in the event that they want to implement corrective actions. An immutable feature is one that a management team cannot change, such as the number of years that have passed since someone entered a given industry after receiving their bachelor’s degree (in the case of a modeling employee turnover).
Features that need to be changed and therefore require a re-run of the models:
Mutable features: These are the features that management could potentially change. Because these are the measures that may be manipulated, it is possible that by doing so, management can improve the status (health, mood, etc.) of the subjects in a given data set.
Management interference with mutable features often comes with complications.
Any time management changes the environment in which a model operates in order to make an improvement, that model then loses efficacy. However, in the interest of consistent improvements, data scientists must accept this reality and carefully monitor model performance after implementation. Some interventions are less harmful to model performance than others.