Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 15: Build Candidate Models (Model Selection Process (Tournament…
Chapter 15: Build Candidate Models
Starting the Process of DataRobot
Select the target feature
The desired target feature can be found directly in the feature list; hover over it, and the click on the "Use as Target" text
Alternatively, the name of the feature can be typed into the target area
DataRobot offers the option of which metric to optimize the produced model
if a customer or client has a preference for a measure on which the will evaluate the performance of the resulting models
Select the orange down-arrow, the option for additional metrics are displayed (just the top options for this dataset)
LogLoss (accuracy) means that rather than evaluating the model directly on whether it assigns cases (rows) to the correct "label" (F and T), the model is evaluated instead based on probabilities generated by the model and their distance
from the correct answer
Examine advanced options: Click "show Advanced Options"
The only difference between Random and Stratified is the Stratified option works a bit harder to maintain the same distribution of target values inside he holdout as the other samples
Partition Feature
this method is for determining exactly which cases are used in different folds
different from the other approaches in that the user must do their own random or semi-random assignment of cases
It is assumed that the allocation to train, validation, and holdout samples has been manually specified
The group approach: accomplishes much of the same as with the partition feature, but with some key differences
Allow specification of a group membership feature, 2. DataRobot makes decision about where a case id to be partitioned but always keep each group
The date/time deals with critical evaluation issue: making sure that all validation cases occur in a time period after the time of the cases used to create models
Starting the Analytical Process
Autopilot will implement the standard DataRobot process and likely lead to the best possible results
Keep the autopilot option selected and click the Start button. See the processes on page 161
"Setting target feature", 2. "creating CV and Holdout Partitions", 3. "characterizing target variable, 4. "loading dataset and preparing data, 5. "saving target and partitioning information", 6. importance scores are calculated, 7. "calculating list of models"
As soon as step 7 is done, a new column will be added to the feature list: Importance. Providing first evidence of the predictive value of specific features
the is not a great Logloss value. one generally does not expect one feature alone to carry a model.
green bar is the importance column
Model Selection Process
Tournament round 1: 16% sample case sample pg 166
Models on the upper left of the screen to view finished models
Auto-tuned-word n gram text modeler pg 164
Cross Validation: once finished, the models are then sorted by cross valisation
AVG Blender and Advanced AVG Blender pg 172