Please enable JavaScript.
Coggle requires JavaScript to display documents.
Intro To Machine Learning Workflow - Coggle Diagram
Intro To Machine Learning Workflow
Big picture
data
dataset knowledge
model
choosing correct ML model/algo for the problem
problem
setting constraints, goals, and boundaries
Framing the problem
do we need ML
who are the key stakeholders
whos is going to be impacted
what data can I use
need to
understand where the data came from
which metrics are important
should be easy to understand
adapted to the specific problem
business metrics != ml model metrics
confusion matrix
2x2 grid that displays
TP
TN
FP
FN
classification metrics
Precision
TP/(TP + FP)
error rate of algorithm
Of the elements classified as a particular class, how many did we get right?
Recall
TP/(TP + FN)
error rate when the data is the expected value
The number of images classified correctly divided by the total number of images.
Accuracy
(TP + TN) / (TP + FN + FP + TN)
how good algorithm is at predicting the correct thing
The number of correctly classified images over the total number of images.
Only for classification
Object detection
green boxes are ground truths
red dashed boxes are the model's prediction
Uses precision and recall from classification
Exploratory Data Analysis
looking at data before working with it
object density
environment
weather
light
ML algorithms are sensitive to domain shift
this can happen at different levels
weather/light conditions
sensor
environment
Cross Validation
ensuring an ML algorithm can perform well in any environment
we should evaluate the generalization ability of the ML model
overfitting
when the model does no generalize well
fits data trained exactly
good at predicting and handling data
chosen model is too complex
bias-variance-tradeoff
why is it hard to create a balance model
TestError = Var + Bias + epsilon
error rate of model on test data set
variance - sensitivity to training data
Increases with complexity
low variance means the model can adapt to new data easily
bias - quality of the fit
decreases as model becomes more complex
Test error - error rate on the test data
parabola
has an optimal complexity
cross validation
technique to evaluate how well a model generalizes
Validation Set
include 80-90% of data in the training set
include 10-20% of data in the validation set
used for cross-validation to select the best parameters or compare models
TFRecord
tensorflow record
not human readable
need to use protofile to read fields of TFRecord
tensor flow custom data format