Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 16 (Learning curves and speed (Additional data (Calculate cost of…
Chapter 16
Learning curves and speed
Additional data
additional cases
additional features
Calculate cost of additional data
use cross validation to assess model quality
better data at beginning, additional data is less useful
Speed & Accuracy Tradeoffs
generally negatively correlated
Look for "efficient frontier line"
i.e.. Time it takes for model to be efficient
if time is a factor:
follow efficient frontier until model is reasonably accurate
Blueprints
Shows how data robot came to specific conclusion
can click on specific box to see how conclusions were drawn by data robot
"one hot encoding"= proprietary system used in software
Missing values inputted
vapes are standardized
Hyperparameter Optimization
Use algorithm to create model
Outperforms top data scientists in this respect
Shows the number of trees used to predict and how many splits the trees use