Please enable JavaScript.

Coggle requires JavaScript to display documents.

Chapter 7: Decision Analytics Thinking I : (Evaluating Classifiers (The…

- - - - Pros: popular metric & easy to measure Cons: Too simplistic for applications of data mining techniques to real business problems.
  - - - Errors of the Classifier
        
        False Positives: Negatives instances classified as positive
        
        False Negatives: positives classified as negative
  - - - They operate differently. Classifier A: falsely predicts that customers will churn when they will not. Classifier B: falsely predicts that customers will not churn when in fact they will.
        
        Better Model? Classifier B!
  - - - Two errors very different; have different costs; should be counted separately
        
        Solution: Estimate cost or benefit of each decision a classifier can make ====== expected profit (or expected benefit or expected cost)
- - - - 1.) the structure of the problem
      - 2.) The elements of the analysis that can be extracted from the data
      - 3.) the elements of the analysis that need to be acquired from other sources
    - - Equation: EV = p(o1)v(o1)+p(o2)v(o2)+p(o3)*v(o3)
    - - provides a framework; use historical data to find probability
    - - Look at how well each model does and what is its expected value
      - Error Rates:
        
        Evaluating: These probabilities can be estimated from the tallies in the confusion matrix by computing the rates of errors and correct decisions
        
        Count(h,a): each cell of the confusion matrix contains a count of the number of the decisions corresponding combination of (predicted, actual)
        
        We reduce these counts to rates or estimated probabilities p(h,a) ; this is done by dividing each count by the total number of instances
      - Costs and Benefits
        
        Correct Classification: correspond to the benefits b(Y,p) and b(N,n)
        
        Incorrect Classifications: correspond to the "benefit" b(Y,n) and b(N,p), respectively, which may well actually be a cost (negative benefit) and referred to as costs c(Y.n) and c(N,p)
        
        Important Note: while probabilities can be estimated from data The costs and benefits often cannot
        
        Common way of expressing expected profit
        
        Factor out the probabilities of seeing each class, referred to as, class priors
        
        Rule of Basic probability:
- - - - Naive classifier that always chooses the majority class of the training dataset