CH7 What Is a Good Model? (The Confusion Matrix (class con‐fusion vs the…
CH7 What Is a Good Model?
Classification accuracy:Easy to measure;too simplistic for applications of data mining technique
accuracy = Number of correct decisions made/ Total number of decisions made
Bad Positives VS Harmless Negatives
The Confusion Matrix
class con‐fusion vs the confusion matrix
for a problem involving n classes is an n × n matrix with the columns labeled with actual classes
separates out the decisions made by the classifier, making explicit how one class is being confused for another.
Generalizing Beyond Classification
what is important in the application? What is the goal? Are we assessing the results of data mining appropriately given the actual goal?
(ii) the elements of the analysis that can be extracted from the data, and
(iii) the elements of the analysis that need to be acquired from other sources (e.g., business knowledge of subject matter experts).
(i) the structure of the problem
Exemplary techniques: Various evaluation metrics; Estimating costs and benefits; Cal‐ culating expected profit; Creating baseline methods for comparison.
Problems with Unbalanced Classes
Problems with Unequal Costs and Benefits
no distinction between false positive and false negative errors
The general form of an expected value calculation
EV = p(o1) · v(o1) + p(o2) · v(o2) + p(o3) · v(o3) ...
Using Expected Value to Frame Classifier Evaluation
Using Expected Value to Frame Classifier Use:Expected benefit of targeting = pR () · vR + 1 - pR () · vNR
Evaluation, Baseline Performance, and Implications for Investments in Data
it is important to consider carefully what would be a reasonable baseline against which to compare model performance.
: baseline models that they compare against. 1.One (persistence) predicts that the weather tomorrow is going to be whatever it was today. 2.The other (climatology) predicts whatever the average historical weather has been on this day from prior years.