Provost Ch. 7 - What is a Good Model? (Unbalanced Classes (Accuracy is the…
Provost Ch. 7 - What is a Good Model?
False Positive = negative instances classified as positive
False Negative = positives classified as negative
Because the unusual class is rare among the general population, the class distribution is unbalanced or skewed.
More skewed causes evaluation based on accuracy to break down.
Accuracy is the wrong thing to measure
Makes no distinction between false negative and false positive errors
Measure expected values instead
Decomposes data analytic thinking into:
the structure of the problem
the elements of the analysis that can be extracted from the data
the elements of the analysis that need to be acquired from other sources
is the weighted average of the values of the different possible outcomes where the weight given to each value is its probability of occurence
Can use this to determine which model will work best
Cost and Benefit Matrix
specifies the cost or benefit of making a decision for each pair
Costs and benefits cannot be estimated from the data - depend on external information
It is important to consider carefully what would be a reasonable baseline against which to compare model performance
classification tasks: good baseline = majority classifier (chooses the majority class of the training dataset)
Maximizing prediction accuracy is not always appropriate
Predict average value over population
Mutiple simple averages that one might want to combine