Please enable JavaScript.

Coggle requires JavaScript to display documents.

Lecture 2 Predictive Modelling (Evaluation (Performance scores)…

- - - - i.e. predict an outcome
- - - - Credit scoring
        
        Good
        
        Bad
      - Spam filter
        
        Spam
        
        Ham
      - Accept marketing offer
        
        Yes
        
        No
      - Full Example
        
        Domain
        
        Direct marketing
        
        Goal
        
        Build a model for predicting whether a customer will respond to a letter offering cable TV subscription
        
        Business Rationale
        
        If the company can predict responses, it can prioritise communications
        
        Data
        
        Target label
        
        Offer accepted?
        
        Binary {Yes/No}
        
        Possible attributes
        
        Demographic (age, residence)
        
        Social status (employment, family)
        
        Purchase of other products
        
        Some data may be difficult to get, a data miner would use the attributes available
  - - - Recognise which instrument is playing
        
        Violin
        
        Piano
        
        Drums
      - Driver-less Car
        
        Urban
        
        Forest
        
        Desert
        
        Swamp
      - Personal email categories
        
        Important
        
        Travel arrangements
        
        Offers
      - Full example
        
        Domain
        
        Personalisation
        
        Goal
        
        Organise incoming email according to the topic/relevance
        
        From Unknown senders
        
        Rational
        
        Personal assistance
        
        Saving time
        
        Increasing efficiency
        
        Data
        
        Target label
        
        What type of email?
        Categorical
        
        Important
        
        Travel
        
        Marketing
        
        Request for information
        
        Possible attributes
        
        Presence of some keywords
        
        Message length
        
        Mentions time/data
  - - - Predict House Price
      - Predict temperature for tomorrow
      - Determine the salary for a job candidate
      - Predict Olympic medal count
      - Full example
        
        Domain
        
        Economics/Social science
        
        Task
        
        Predicting Olympic medal counts for countries
        
        Rationale
        
        Understanding how well-being of a country affects the Olympic performance, scientific curiosity
        
        Data
        
        Target
        
        Medal count for a country in the Olympics
        
        Numeric
        
        Possible attributes
        
        Number of internet users
        
        Total GDP
        
        Total Population
        
        Latitude
        
        Economic freedom index
- - - - Danger of over-fitting
  - - - Hold out test
      - Cross-validation
      - Leave-one-out-cross-validation
- - - - Every wave is slightly different
        
        If the first wave is used as the model, the next wave will not be the same
        
        We want to capture the generic underlying relationships in the data, and not one-off noise or occasional errors
    - - Algorithm specific techniques (Regularisation)
      - Cross-validation
        
        Reserving unseen data for testing
- - - - Often one tries many alternative algorithms
    - - Try to make the model building process systematic
      - e.g.
        
        Step changes in model parameters
        
        Rethinking model choices
        
        Revising attribute selection
- - - - Training data
        
        Used for training a model
      - Testing data (hold out set)
        
        Used for testing the model
    - - We may over-fit while doing model or feature selection
        
        Unintentionally guide the modelling process towards learning more and more data noise
    - - Performance score measured on the training data is much better than on the new data
    - - Put aside another test-set that would be used only once for final testing
      - What if the final testing is not good enough?
  - - - We can use as much data as possible for training
        
        By swapping training and testing sets
        
        \[ \text{2 rounds }\rightarrow\text{ 2-fold cross-validation} \]
        
        \[ \text{Computing average performance score from 1 and 2} \]
  - - - Use leave-one-out testing
        
        Only one example serves as testing data each round
- - - - Classification accuracy = correct predication / total number of predictions
        
        7/10=70%
        
        Classification error = wrong predictions / total number of predictions
        
        3/10=30%
  - - - Baseline is the accuracy (or error) achieved by a "naive" prediction strategy
        
        e.g
        
        Random guessing
        
        Majority class
    - - Classification accuracy = 5/10 = 50%
      - Accuracy of random guessing:
        
        Binary classification 50%
        
        For 3 class problem 33%
        
        For 4 class problem 25%
        
        For k class problem 1/k
  - - - e.g
        
        Many more healthy companies than bankrupt
        
        Very few travel emails but lots of advertisements and meeting emails
      - Accuracy of majority class = 7/10 = 70%
        
        Is this good?
      - If classes are imbalanced or one class is more important than the others we need more advanced evaluation measures
  - - - e.g.
        
        Relevant documents vs irrelevant
        
        Fraudulent credit card transactions vs normal
        
        Customers that subscribed following the offer vs other customers
    - - This is for binary classification, but a confusion matrix can be drawn for multi-class classification (for 3 classes - 3 rows and 3 columns)
  - - - The fraction of true positives in all examples reported as positives
        
        Precision = True Positive / (True Positive + False Positive)
    - - The fraction of true positives in all examples that were actually positives in the testing set
        
        Recall = True Positive / (True Positive + False Negative)
    - - Combines the two
        
        F-score = (2 * Recall * Precision) / (Recall + Precision)
    - - Precision
        
        How many out of those checked were carrying a gun
      - Recall
        
        How many out of those who were actually carrying a gun were caught
    - - Precision = 2/(2+2)=50%
      - Recall = 2/(2+1)=66%
      - F-score = 2 * 0.5 * 0.66 / (0.5+0.66)=0.4
  - - - e.g.
        
        Classifying spam as ham is not as bad as classifying a relevant email as spam
        
        Spam as ham costs £1
        
        Ham as spam costs £5
        
        Classification error = (10+20)/100 = 30%
        Error cost = 10*£1 + 20*£5 = £110
        
        Classification error = (25+5)/100 = 30%
        Error cost = 25*£1 + 5*£5 = £50
  - - - MAE = sum(absolute(errors)) / number of predictions
        
        MAE = (|-9|+|6|+|1|+|-2|+|-7|) / 5 = 5.0
        
        The lower the MAE the better
  - - - White box interpretable vs black box models
        
        White box models (explainable predictions)
        
        e.g.
        
        Decision trees
        
        Rules
        
        Black box models (model prediction is hard to explain)
        
        e.g.
        
        Neural networks
        
        Support vector machines
    - - Does the model continuously perform well?
        
        Or sometimes performs very well, but from time to time is extremely wrong?
    - - What predictions are going to be used for?
        
        e.g.
        
        Predicting very certainly for a small number of customers vs predicting less certainly for more
        
        Predicting the direction of change in exchange rate correctly vs absolute error