Chapter 19: Interpret Model (19.1 Feature Impacts on Target (1. overall…
Chapter 19: Interpret Model
19.1 Feature Impacts on Target
overall impact of a feature without consideration of the impact of other features
This treats each feature as a standalone effect on the target
provide a useful way to sort features, but should not be relied on for feature selection and model interpretation
. The Overall Impact of a feature adjusted for the impact of other features
Compute Feature Impact
calculates the value of each feature in the context of the model
Most important feature is scaled to score 100%, and the other features are scared relative to it
Creating models with fewer features is generally a good idea to avoid overfitting, and can also reduce problems due to changes in the databases and sources of data.
The Directional Impact of Features on Target
Whether the presence of a value helps the model by assisting it in predicting readmissions or non-readmissions
shows the result of a logistic regression analysis
Variable Effect screen does not provide all the information to recreate a model, it does provide what are commonly known as coefficients (labeled here as Effect) for the most important feature characteristics that drive a prediction decision.
The Partial Impact of Features on Target
Compute Model X-Ray
The left Y-axis contains the frequency of cases in the validation set. The X-axis contains the values of the most predictive feature
DataRobot averages its prediction probabilities and places these probabilities as a blue cross, labeled “Predicted.” The average probability is again found in the rightmost Y-axis
May be effective to use a new feature as a target
19.6 The Power of Language
Subject matter expertise is important in the evaluation of text models -- to be able to read the data
Represents the words that have the highest coefficients
The intensity of the red or blue colors indicates the size of their coefficient. By hovering over a term the coefficient of that specific term is shown.
a common word generally assumed to have little value
Such assumptions can be dangerous, so for every text field examined, uncheck this stop-word option to be aware of what is being filtered and ensure that they are truly stop words.
Insights --> Hotspots
The Hotspot screen shows the most relevant (up to four) combinations of features and their effect on the target.
Similar to a set of Venn diagrams where the largest and most overlapping hotspots are organized in the middle. Just as for the word cloud, the deeper the tone of the blue and the red, the more of an impact that particular combination of features that specifies a sub-group of patients has on the target.
While the Hotspot panel is quite visually impressive, it is not recommended to show this particular screen during presentations due to its exceedingly high level of detail and complexity.
19.8 Reason Codes
The reason codes are a powerful feature that can supplement business decisions.
The left and right thresholds allow for the specification of the probability cutoffs to be used in this view.
Computing reason codes is slower than computing predictions
Reason codes engage in additional evaluations of why a prediction was set as the given probability for that case.