Please enable JavaScript.
Coggle requires JavaScript to display documents.
CH 7 What is a good model? (Key analytical framework: expected value (cost…
CH 7 What is a good model?
Key analytical framework: expected value
using expected value to frame calssifier evaluation
does model perform better than a different model
what is its expected value
calculation uses enumerated possible outcome
Expected value is weighted average of
different possible outcomes
Weight equals
probability of
occurrence
cost and benefits
Cost and benefit needed to go
with each decision pair to
calculate expected profit
problems
double counting
putting benefit in one cell and cost for same thing in another cell
signs in cost benefit matrix must be very consistent
cost is negative
the benefits are postivie
using the expected value to frame classifier use
expected values provides framework for carrying out analysis
expected benefits or cost
error rates
Probabilities estimated from tallies in
confusion matrix by computing the rates of
the errors and correct decisions.
What do you want to achieve by mining data?
Evaluation, Baseline
Performance, and
Implications for Investments
in Data
things to consider
What would be reasonable to compare model performance to?
Good baseline equals majority classifiers
Always chooses the majority class
Data Sources
should be invested in the asset
Baseline Compaarison
Stakeholders find informative and hopefully persuasive
Evaluation of classifiers
Plain accuracy and its problems
problem equals too simplistic
Good outcome / total
Confusion matrix used to understand the problems
Negative equals normal good outcome
Uninteresting or begin
Positive equals bad outcome
worthy of attention or alarm
Confusioin matrix
makes explicit how one class is being confused for one another
different errors are dealt with separably
Separates out decisions
made by the classifier
Problems with Unequal
Costs and Benefits
No distinction between false
positives and false negatives
Type 2 error
type 1 error
Problems with unbalanced classes
The Accuracy is wrong
do not know how much we care about the errors
Generalizing beyond classification
Whats the goal
What is important in this application
are we approriatly mining the given goal