Please enable JavaScript.
Coggle requires JavaScript to display documents.
chapter 21-23 (chapter 21- set up prediction system (restraining model…
chapter 21-23
chapter 21- set up prediction system
restraining model
Recall the possibility that the best model developed in DataRobot may not perform
quite as well as expected when put into practice, as suggested by the holdout
sample LogLoss score being slightly lower than that of the cross validation sample.
One reason a model’s success rate can be lower than expected, perhaps even
deteriorating over time, is that the environment being modelled changes subtly as
time passes
the model in this case was trained with data randomly
selected from an unknown time period (the period during which the patients
selected for inclusion visited the hospital)
the performance of the model
was validated with data collected during the same period. In this case, there was no
alternative course of action, as this data set provided no access to the date of the
patient visit (perhaps in order to maintain patient privacy).
That being said, there
are steps that can be taken to improve predictive ability, perhaps even to the point
of negative such concerns
Since validation scores are no longer needed to order models, nor is holdout data
necessary to evaluate for overfitting, 100% of the data can now be utilized to
create a model.
choose deployment strat
Drag-and-drop
Application Programming Interface (API)
DataRobot Prime
Batch
In-place with Spark
Drag-and-drop Scoring is accessed through the Predict screen of the selected
model, per Figure 21.4. Since this screen has been addressed previously, specific
details will be passed over apart from the need to upload all relevant data in a file
containing the features used to create the model. The file does not need to include
a target since this is what DataRobot is now preparing to predict. After uploading a
file to DataRobot, click Compute Predictions ( ). DataRobot will apply
the model to all the uploaded data, after which the results can be downloaded by
clicking Download ( ).
API Scoring is relatively straight-forward for those able to program in R or Python.
An Application Programming Interface (API) is created on the DataRobot server,
allowing a developer to write a program that uploads new patient data to the API,
which then returns a probability that the patient will be readmitted
DataRobot Prime Scoring creates an approximation of the selected model,
available as code in the Python and Java programming languages Prime Scoring is
availability based on DataRobot account type. DataRobot cannot guarantee that this
code is as accurate as the original model, but it is often quite closely comparable (in
the experience of this book’s authors). This Prime-generated code may then be
placed into the business workflow
Batch Scoring uses the DataRobot API to upload and score multiple large files in
parallel
In-place Scoring allows for exporting the selected model as an executable file to
be used in an Apache Spark environment. Spark is a fast and widely distributed
data processing environment
chapter 23 - create model monitoring and maintenance
potential problems
serves to informs others what to do
in the event of changes in the environment that stand to impact the effectiveness of
the model
environment changes that vary
new systems
real world environmental changes
strategies
DataRobot will fail rather than
attempt to make the best of the available data.
this is actually good
wait until target value is available
To avoid model failure, it is a good idea to rerun the model as soon as sufficient
new data is available
early detection is really important
detect declining perf through evaluating training data against new data
create new target that specifies whether a case was used to creat original model or if it was from the production system after the model was used for prediction
Once a sufficient set of production cases are available, machine learning may be
run with the source of the data as the target. If the produced model is capable of
distinguishing between the two sets of data, this would be an indication that the
business context has changed enough to warrant model retraining with access to
additional data that includes more recent patient cases
chapter 22 - document modeling process for reproducibility
model documentation
Documenting the modeling process is where projects most often fail. Those
attracted by the “search for truth” aspect of machine learning are seldom motivated
by the required documentation that follows a machine learning project. Most would
much rather create the model, implement it,
model documentation as an opportunity to do more of the desirable
central machine learning work
Making careful notes about the business purpose served by the model will also help
to save time in the future. Pay attention to the following
where did the data come
from? How was the data processed? What parameters or selections were used when
creating and selecting the model? How is the model used within the business?
document where the data came from