Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 15: "Build Candidate Models" (starting the analytical…
Chapter 15: "Build Candidate Models"
Starting The Process (15.1)
Choose target
Select metric to optimze
Log Loss
Probability generated by model and distance from correct answer
Advanced Options (15.2)
Partitioning Methods
random
whatever % of data assigned
stratified
tries to maintain normal distribution but similar to random
remaining cases split into folds
folds are tricky and depend on data size
data robot swears by 5
runs through these when training fold
group
specify groups
data robot makes decision to keep but maintains groups
date/time
make sure data comes from after model was created
target leaks!
holdouts
must have 3+ unique values or no holdout
AKA must have training data, validation and then holdout
Average of 5 validation scores is the cross validation score
starting the analytical process (15.3)
autopilot
similar to quick
quick
speeds up the process of autopilot
good for time crunch
manual
informative features
all features except ones excluded and tagged
steps
set target
create CV and holdout partitions
chraacterize target variable
load and prepare data
If its
Large, 500 mb+
a 500 mb sample
or rest of dataset is loaded
save target and parition information
analyze
calculate modle
order of importance
lowest first
can combine to create something valuable and viable
model selection process (15.4)
"worker"= computer
more processing power, more cpvs
begin creating and running algorithms
like a tournament for which creates the best approach
then ranked on a leader board
put forward for further participation if good approach
to 32%, 64% etc. until chosen for best algorithm
cross hedging