Chapter 15: Build Candidate Models (15.4 Model Selection Process (15.4.1…
Chapter 15: Build Candidate Models
15.1 Starting the process
Start with a walkthrough DataRobot
The desired target feature can be found as "Use as Target", the name of the feature can be typed into the target area. Field will auto-populate
15.2 The distribution of the target feature is displayed
True and False
Had it not been true and false, then DataRobot automatically deals with it y downsampling (randomly removing cases) the majority class (most common
You can use the start button at this point but its not recommended
DataRobot offers the option of which metric to optimize the produced models for. Can be trusted to select the good measure. You can use the recommended arrow if you want to see other options.
Regression: predicting a numeric value
15.2 Advanced options (not recommended for first time users)
"Show advanced options"
Random and Stratified give identical options. The difference between random and stratified is that the stratified option works a bit harder to maintain the same distribution of target values inside the holdout as the other samples
Once the lockbox has been filled with holdout cases and appropriately locked, spilt the remaining cases into n folds. The n of folds can also be set manually.
Partition Feature: a method for determining exactly which cases are used in different folds
Group approach: accomplishes much of the same as with the partion feature, but with some key difference: first, it allows for the specification of a group membership feature. Second, DataRobot makes decisions about where a case is to be partitioned but always keeps each group together in only one partition
Data/Time: deals with a critical evaluation issue, making sure that all validation cases occur in a time period after the time of the cases used to create models
15.3 Starting the Analytical Process
Prepare data through Autopilot, Quick, and Manual
Quick run option: leave more time to create a presentation
An abbreviated version of Autopilot that produces almost as good models by shortcutting the best practice machine learning process
Autopilot will implement the standard DataRobot process and likely lead to the best possible results
Keep the Autopilot option selected and click the start button
Then setting target geature
Then creating CV and Holdout partitions
Then "Characterizing target variable"
Then "loading dataset and preparing data"
Then "saving target and partitioning information"
1 more item...
Stratified partion method 5-fold cross validation and 20% holdout sample
15.4 Model Selection Process
This will include the running of almost every effective algorithm invented for machine learning
15.4.1 Tournament Round 1
15.4.2 Tournament Round 2
15.4.3 Tournament Round 3
15.4.4 Tournament Round 4
15.4.5 Tournament Round 5