Please enable JavaScript.
Coggle requires JavaScript to display documents.
Interpret Model (Compute Feature Impact (Doing this will initiate a…
Interpret Model
Compute Feature Impact
-
DataRobot does this by randomly shuffling the values of one feature within the validation data segment (thereby removing its ability to have a meaningful impact on the target)
DataRobot then examines the model’s performance relative to the model that retained all the features. The extent to which the model with the randomly shuffled feature does worse than the original model is assigned as the feature’s value
This procedure is done with every feature (all the features not being examined retain their original data). Once each feature has been scored this way, the most important feature is scaled to a score of 100%, and the other features are scaled relative to it
There are four kinds of relationships that are commonly useful for exploring why a
model predicts certain outcomes
- The overall impact of a feature without consideration of the impact of other features
- The overall impact of a feature adjusted for the impact of other features
- Partial Impact of a feature
- Direction Impact of a feature
whether the presence of a value helps the model by
assisting it in predicting readmissions or non-readmissions
Unfortunately, these scores are not fully reliable
indicators of the value of a feature
For example, it is quite common for a pair of
features (or more) to contain the same information
while the problem and target have been carefully specified, we still do not fully understand what drives that target
Convincing management that a model should be implemented typically includes explaining the answers to “why,” “what,” “where,” and “when” questions that are embedded in the model
The overall impact of a feature without consideration for the impact of other
features treats each feature as a standalone effect on the target
The importance score is exceedingly useful because it allows a data scientist to focus attention on the features most likely to yield additional predictive value if misinterpreted by the AutoML, such as through misinterpretation of the variable
type
This misinterpretation would include the treating of a categorical feature as
though it were a numeric feature
In short, while they provide a useful way to sort features, importance scores should not be relied on for feature selection and model interpretation
DataRobot has conveniently placed an option to select a set of top features as a new feature list. All that is necessary is to give the new list a name and select how many of the top features will populate it. A new model run can then be done using
this feature list. Creating models with fewer features is generally a good idea to avoid overfitting,50 and can also reduce problems due to changes in the databases and sources of data it
Generally, the Tree-Based Variable Importance screen in the Insights area does not demand a high degree of scrutiny. It is useful because it is generated using minimum of processing power, but it also applies only to tree-based models and contains less accurate information than what may be retrieved from selecting Feature Impact under the very same models. The Feature Impact pane uses information from any tree-based model to show yet another view of feature importance.51 As always, these results are best derived from the most accurate model, so go once more to the model leaderboard and search for the word “tree.”