Please enable JavaScript.
Coggle requires JavaScript to display documents.
Setting up your goal - Coggle Diagram
Setting up your goal
Satisficing and Optimizing metric
Satisficing
Condition that must be meet. if this condition is not respected that algorithm is not a option.
Optimize
Off all algorithms that respect the satisficing condition the one with better performance on optimize metric is the chosen.
Examples
Accuracy & Running time
Accuracy is the optimizing metric, the goal is to have a greater accuracy possible. Running time is the satisficing metric, the condition is less than 100ms.
Accuracy & False Positives
The accuracy is the optimizing metric, the one with better accuracy is chosen. But on the chose process, we only include the ones that respect the satisficing condition, in this case, have one or less false positive in 24h.
Size of Dev and Test sets
The 70/30 split (10% to train, and 30% test) or 60/40 have been good approach, and still are, in sets with not much data < 10000 +/-.
Now we have set that have millions of rows. In um set with 1000000 of train examples might be quite reasonable 98% on the train, 1% dev and 1% on a test.
Test set
Set your test set to be big enough to give high confidence in the overall performance of your system.
When to change dev/test sets and metrics
When the evaluation metric is no longer correctly rank ordering preferences between algorithms (cat algorithm than allow pornographic images to pass) that is a sign that the evaluation metric or development set or test should be changed.
If doing well in your metric + dev/test set does not correspond to doing well on your application, change your metric and/or dev/test set
Single number evaluation metric
This can improve the efficiency of project. Multiply metrics can be unclear and can take a while to choose what is the best option.
Examples
Precion & Recall
Precision
% of correct classification by the algorithm
Cats examples: if the algorithm says it's a cat its x% chances of be correct
Recall
% of examples that are correctly classified
Cats Example: x% of the cats are identified
F1 score
Combination of precision and recall
This should be the metric chosen to the project.
Error % in different geographies
Imagine that we have a set of algorithms that have % off error in 10 geographies. Is very hard to choose the best because is a #x10 matrix. If we add a column, the average error is much easier to choose the best algorithm.
Train/Dev/Test distributions
The distributions of sets used in an ML project must be the same in all phases of the project. If we develop an ML. If our dev set is based on Europe and the test set only have data of US the results gonna be bad.
Choose a dev set and test set to reflect data you expect to get in the future and consider important to do well on.