Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 16: Understanding the Process (blueprints (16.3) (can't see…
Chapter 16: Understanding the Process
Learning Curves and Speed (16.1)
more data= more predictability?
through
additonal features
additional cases
rule of thumb
greater relevant data @ outset means you are less likely to need more
learning curves
y-axis
validation scores (smaller is better)
x-axis
% of data needed
how did the model do with x amount of data exposed to it?
predictability is generally better with more data
cross validation is more relevant with more data
Accuracy Tradeoff (16.2)
efficient frontier line
speed vs accuracy
negatively correlated
which models are best (most efficient)
is time a factor for you?
example... website users to fast, you need 2 models
blueprints (16.3)
each model works differently depending on their algorithm used
can't see the exact inner workings
unless you look in the datarobot model docs
example: regularized logistic regression
imputation
median is used for this
justification given
which were imputed given
standardization
mean is set to zero
standard deviation is unit variance (aka 1)
one-hot encoding
switch categorical into numerical based off of unique values
min_card/min_max
how many unique values you need
hyperparameter (16.4)
most important part, very complex