Ch. 15 Build Candidate Models

Starting the Process

select the target feature

hover over and click "Use as Target"

LogLoss (accuracy)

evaluated based on probabilities generated by the model and their distance from the correct answer

Advanced Options (not recommended for first-time use)

Random and Stratified

Stratified option works a bit harder to maintain the same distribution of target values inside the holdout as the other samples

Partition Feature

determining exactly which cases are used in different folds

user assigns own random or semi-random assignment of cases

Group Approach

allows the specification of a group membership feature

DataRobot makes decisions about where a case is to be partitioned but always keeps each group together in only one partition

the date/time option deals with a critical evaluation issue: making sure that all validation cases occur in a time period after the time of the cases used to create models

Starting the Analytical Process

prepare the data through the prescribed options: Autopilot, Quick, and Manual

Quick run = abbreviated version of Autopilot

Informative Features

represents all the data features except for the ones automatically excluded and tagged

After 7 steps to running analytic process is complete - 'importance' column is created

importance column indicates relative importance of a feature when examined against the target independently of all other features

Model Selection Process

James Frainey