Please enable JavaScript.

Coggle requires JavaScript to display documents.

Decision Analytic Thinking I: What Is a Good Model? (A key analytical…

- - - - One worthy of attention, an alarm
    - - Uninteresting or benign
  - - - Use confusion matrix
        
        N classes is an n x n matrix
        
        Columns actual class label
        
        Rows predicted classes
        
        Separates decisions made by classifier
        
        True classes P(ositive) or N(egative)
        
        Predicted classes Y(es) or N(o)
        
        Errors of classifiers are false positives & false negatives
        
        Unbalanced classes
        
        Class distribution becomes more skewed, accuracy breaks down
        
        Even when skewed not great, domain where one class more prevalent than another accuracy may be misleading
        
        Cellular-churn example
        
        Bottom-line is accuracy simply is the wrong thing to measure
        
        Both models classify 80% of balanced population
        
        1 more item...
  - - - Believes are the same, not the case
      - False positive error - wrongly informed patient of cancer
      - Opposite, has cancer but is wrongly told they do not - false negative
    - - Once aggregated produces expected profit
  - - - What is important in the application?
      - What is the goal?
      - Are we assessing the results of data mining appropriately given the actual goal?
- - - - p(01) is probability
  - - - Neccessary to compare one model to another
    - - error Rates
        
        Estimated from tallies in confusion matrix
      - Cost and benefits
        
        Correct classifications - true positives & negatives correspond with benefits b(Y, n) and b(N,n)
        
        Incorrect classifications - false negatives & positives correspond with benefit and costs c(Y,n) & c(N,p)
        
        Can be summarized in a 2x2 matrix
        
        Expected profit equation (too long to type, check book)
        
        Positive example very rare, contribution to expected profit very small
- - - - naive classifier that always chooses the majority class of the training dataset