Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 16: understanding the process (Accuracy Tradeoffs (speed vs.…
Chapter 16: understanding the process
Learning curves and speed
Forms of additonal data
Additional features
additional cases
Validatio score on y axis and data used on x axis
Log loss should be low
More data = demisishing returns
At some point the cost outwieghs the benifit
Data becomes stale over time
As size of data incresses cross validation is better
Accuracy Tradeoffs
speed vs. accuracy tab
Shows top 10 models by validation score
model must be capable of producing predictions rapidly as new cases are ariving
Start by calculting speed of the slowes model by cases per second and compae with predictive measure
Some real life situations require fast prediction speeds (tradeoff)
Blueprints
imputes missing values with averages or large negitive number
Shows algorithm steps
Tree based algorithims
imputation
can use the median value for the feture
info @ "missing values imputed" box in the blueprint
For each feture imputed another feature is created callled an "indicator"
indicator will contain false if not imputed and true if value was imputed
Standardization
numeric features are all standardized
some models struggle with features that have different standard deviations
Each feture is scaled with the mean of the feautre at zero
standard deviation is set to "unit variance"
One-hot encoding
for catagorical data a new feture is created for each catagory
each catagory is converted to binary feild
Helps algorthms estimate
smalles feature is removed