Please enable JavaScript.

Coggle requires JavaScript to display documents.

Chapter 17. Evaluate Model Performance (One of the AutoML criteria is…

- - - - Easiest to explain.
      - Fraction of Variance Explained Binomial
        
        FVE Binomial provides a sense of how much of the variance in the dataset has been explained and is equivalent to an R2
        -value. In simpler terms, this metric states how far off, percent -wise, the model is from fully explaining who will be readmitted (to turn an R2-value into a percent, multiply it by 100).
        
        It does, however, allow a user to reorder the leaderboard by this and several other measures to more holistically evaluate the models produced. In this case, the best model explains only 10% of the target (0.0958 out of 1.00), providing yet another confirmation that the data available for this project are far
        from perfect or that the problem being addressed is quite challenging.
  - - - DataRobot does share the origin of algorithms and the parameters used therein. To see this, click on the Decision Tree Classifier (Gini) name ( ) to access the Decision Tree Classifier model’s blueprint.
        
        the scikit-learn decision tree algorithm
        
        Find the most predictive feature (the one that best explains the
        target) and place it at the root of the tree.\
        
        Split the feature into two groups at the point of the feature where the two groups are as homogenous as possible.
        
        Repeat step 2 for each new branch (box)
        
        17.3 ROC Curve
        
        Perhaps the single most important screen in DataRobot id the ROC curve dashboard.
        
        The Receiver Operating Characteristics (ROC) Curve screen, so named because of the ROC curve in the bottom left corner, is where several central measures of model success exist beyond the original optimization metric, LogLoss.
        
        This process can be visualized as follows:
        each case is assigned a color (green or purple depending on its true value). After its color is assigned, the specific case falls atop the existing cases at their respective assigned probabilities, building the “mountain” for that color.
        
        Confusion matrix:
        
        1 more item...
        
        positive predictive value (PPV):
        
        1 more item...
        
        True Positive Rate (TPR):
        
        1 more item...
- - - - This panel allows the selection of two models from the leaderboard in addition to auto-selecting the top model, placed in the far left position.
        
        With the perfect model, at any cutoff point, one finds only true positive cases, so the “curve” travels immediately along the left bound of the chart. The “curve” remains there until the model begins predicting negative cases, which, in an ROC chart will start to be predicted as positives as the probability distribution threshold moves to the left. The random model displayed here confirms that the ROC chart random line does indeed extend from the bottom left to the upper right corner.
        
        18.2 Prioritizing Modeling Criteria and Selecting a Model
        
        When deciding which model to select, there are five criteria to consider. They are:
        
        Predictive accuracy.
        
        Prediction speed.
        
        Speed to build model.
        
        Familiarity with model.
        
        Insights