Please enable JavaScript.
Coggle requires JavaScript to display documents.
Larsen Chapters 21-23 mega mindmap (Chapter 21 - Set up prediction system,…
Larsen Chapters 21-23 mega mindmap
Chapter 21 - Set up prediction system
possibility of model not performing as expected due to subtle changes in the models environment
steps can be taken to improve model reliability
run model with 100% of data, "run with new sample size" in DataRobot
Choose Deployment Strategy
datarobot provides numerous model deployment strategies
API scoring
straightforward if you can code in R or Python
write a macro that uploads data to the API, which then returns a probability
DR prime scoring
creates approximation of selected model, "run datarobot prime"
downside: now an in-house responsibility to integrate and maintain the model, but allows vendor independence
drag and drop scoring
easiest method in DR, upload files and compute predictions, then download the results
Batch scoring
uses datarobot API to upload, score multiple files in parallel
in-place scoring
exporting selected model as an executable file to be used in an apache spark environment
Chapter 22 - Document modeling process for reproducibility
model documentation
where most projects often fail
proper documentation critical for others to understand actions taken and justification for project to exist
work under assumption project will need to be re-visited in a year
make careful notes about the business purpose served by the model
document where all of the data came from in the model
specify all steps taken in datarobot
business rules for use of the model and the probability thresholds must be recorded
Chapter 23 - Create model monitoring and maintenance plan
potential problems
potential environment changes can be widely varied - in the business data environment or in the outside world
strategies
good idea to re-run the model often - as soon as sufficient new data is available
important to wait until target value is also available
early detection of declining performance is paramount
evaluate training data against new data
creates a new target that specifies whether data comes from the OG model or the new data
doesnt require waiting for new target values to be available, which is a huge benefit