Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 16: Understanding the Process (Blueprints (Each model employs a…
Chapter 16: Understanding the Process
Learning Curves and Speed
Two forms of additional data to see if more data would improve models' predictive ability
Additional Features
Additional Cases
Good idea to work under the rule of diminishing marginal improvement
LossLoss = a measure of mistakes; every mistake increase the "loss score"
Safe to say that models will benefit from more data (compare to cost)
"Stale" data" = data from an earlier data might not lead to improvements
When considering adding more data, use cross validation
Accuracy Tradeoffs
Efficient Frontier line: a line drawn between data points and occurs when two criteria are negatively related to each other
Always look for the EF line
Models must be capable of producing predictions as rapidly as new cases are arriving
Compare results with the reality of the predictive needs required for the project
Blueprints
Each model employs a different set of pre-processing steps unique to that type of model
Imputation
Standardization: features being scaled; mean value of the feature is set to 0 and st. dev is set to 1
One-hot encoding
Shows more information about a model
Logistical Regression: class of generalized linear model that uses the binomial distribution to fit regression models to a binary (0/1) response variable
Hyperparameter Optimization
Step where AutoML provides one of its single most important contributions to machine learning
Parameter tuning can mean the difference between a successful project and a mediocre project
Evaluation of whether or not more data cases are needed to increase the accuracy of models