Ch. 7 Decision Analytic Think I

Evaluating Classifiers

To note

Harmless Negatives

Bad positives

uninteresting or benign

worthy of attention

Classifier Accuracy

Problems with Unequal Costs and Benefits

False negatives

costly mistake

False Positives

measure of classifier performance

too simplistic

well known issues

Confusion Matrix

contingency table

shows how one class is confused for another

neg. instance classified as pos.

pos. instance classified as neg.

Problems with unbalanced data

class distribution

unusual class can skew data

breaks down accuracy

may dominate, less costly

cost benefit of each classifier

expected profit of classifier

Expected Value

Framework for analytics

decompose data-analytic thinking

structure of a problem

elements of analysis that can be extracted

elements that need to be acquired

want value > 0

class priors

factor out the probabilities

Baseline performance

majority classifier

consider what is required from data mining results

regression problems

baseline: avg value over population

mean or median

decision stump

single most informative piece of info

base all decisions on it