Please enable JavaScript.
Coggle requires JavaScript to display documents.
Lecture 2 Predictive Modelling (Evaluation (Performance scores)…
Lecture 2
Predictive Modelling
What is it?
Belongs to supervised learning
Since we need to guide the modelling process
So that a specific relationship of interest in data is learned
Is the process of:
To predict an attribute of interest
i.e. predict an outcome
Which can be applied to unseen data
Building a model from historical data
Historical data
Input attributes are known
Target labels are known
Predictive Model
Captures the relationship between the input attributes and the target
New Data
Input attributes are known
Target labels are unknown
We predict target labels
Tasks
Binary Classification
Target label is binary
Examples
Credit scoring
Good
Bad
Spam filter
Spam
Ham
Accept marketing offer
Yes
No
Full Example
Domain
Direct marketing
Goal
Build a model for predicting whether a customer will respond to a letter offering cable TV subscription
Business Rationale
If the company can predict responses, it can prioritise communications
Data
Target label
Offer accepted?
Binary {Yes/No}
Possible attributes
Demographic (age, residence)
Social status (employment, family)
Purchase of other products
Some data may be difficult to get, a data miner would use the attributes available
Multi-Class Classification
Target label is categorical
Examples
Recognise which instrument is playing
Violin
Piano
Drums
Driver-less Car
Urban
Forest
Desert
Swamp
Personal email categories
Important
Travel arrangements
Offers
Full example
Domain
Personalisation
Goal
Organise incoming email according to the topic/relevance
From Unknown senders
Rational
Personal assistance
Saving time
Increasing efficiency
Data
Target label
What type of email?
Categorical
Important
Travel
Marketing
Request for information
Possible attributes
Presence of some keywords
Message length
Mentions time/data
Regression
Target is numeric
Examples
Predict House Price
Predict temperature for tomorrow
Determine the salary for a job candidate
Predict Olympic medal count
Full example
Domain
Economics/Social science
Task
Predicting Olympic medal counts for countries
Rationale
Understanding how well-being of a country affects the Olympic performance, scientific curiosity
Data
Target
Medal count for a country in the Olympics
Numeric
Possible attributes
Number of internet users
Total GDP
Total Population
Latitude
Economic freedom index
What do we need for building predictive models?
Target/Label
Need to know what we want to predict
Performance score
To assess how good the model is
Input attributes
Think what kind of attributes may relate to the target label
It should be possible to collect this information for new data
Historical data
The target labels must be known in the historical data to be able to learn the relationship
Predictive Model (algorithm) and modelling tools
Model form (shape)
[Predictive model]
Procedure: how to build a model from the data
[Algorithm]
Dedicated software or a general purpose programming language
[Modelling tools]
Modelling Procedure
Build a predictive model from historical data
Pretend we don't know the selling prices and predict them using this model
Check how good the predictions are by computing a performance score
e.g the difference between the true and predicted values
Model Testing
Option 1
Test on the same data as the model was trained
Danger of over-fitting
Option 2
Split the historical data into several parts
Use one part for training and test on a different part
Typical procedures
Hold out test
Cross-validation
Leave-one-out-cross-validation
Issues with models
Over-fitting
Wave example
Every wave is slightly different
If the first wave is used as the model, the next wave will not be the same
We want to capture the generic underlying relationships in the data, and not one-off noise or occasional errors
Spam filter example
When your predictive model describes occasional error or noise rather than underlying relationship
Preventing over-fitting
Algorithm specific techniques (Regularisation)
Cross-validation
Reserving unseen data for testing
Under-fitting
Relations in data are more complex than the chosen model
If you have the option of a simple or a complicated model; pick the simple, it is less likely to overfit
Building a good predictive model
Can be challenging
Intuition and experience guided search
Trail and error
Often one tries many alternative algorithms
Training-testing-training-testing
Try to make the model building process systematic
e.g.
Step changes in model parameters
Rethinking model choices
Revising attribute selection
Testing Procedures
Hold out evaluation
Split the historical data into two parts
Training data
Used for training a model
Testing data (hold out set)
Used for testing the model
Testing on the test set many times risks over-fitting
We may over-fit while doing model or feature selection
Unintentionally guide the modelling process towards learning more and more data noise
Sign of over-fitting
Performance score measured on the training data is much better than on the new data
Possible Remedy
Put aside another test-set that would be used only once for final testing
What if the final testing is not good enough?
Cross-validation
If we do not have a lot of data
We can use as much data as possible for training
By swapping training and testing sets
\[ \text{2 rounds }\rightarrow\text{ 2-fold cross-validation} \]
\[ \text{Computing average performance score from 1 and 2} \]
10-fold cross validation
This way we can use all the historical data for testing (in turn)
n-fold cross validation (n rounds)
Don't forget to mix the order of examples before splitting
For very small data-sets
Use leave-one-out testing
Only one example serves as testing data each round
Evaluation
(Performance scores)
Classification accuracy/error
(applies to binary or multi-class classifications)
The fraction of correct/incorrect predictions in all attempts
Classification accuracy = correct predication / total number of predictions
7/10=70%
Classification error = wrong predictions / total number of predictions
3/10=30%
How good are predictions?
We can compare the performance to baselines
Baseline is the accuracy (or error) achieved by a "naive" prediction strategy
e.g
Random guessing
Majority class
Classification accuracy = 5/10 = 50%
Accuracy of random guessing:
Binary classification 50%
For 3 class problem 33%
For 4 class problem 25%
For k class problem 1/k
Class Imbalance
Classes may have different sizes
e.g
Many more healthy companies than bankrupt
Very few travel emails but lots of advertisements and meeting emails
Accuracy of majority class = 7/10 = 70%
Is this good?
If classes are imbalanced or one class is more important than the others we need more advanced evaluation measures
Confusion matrix
One class is more "interesting" than the other
e.g.
Relevant documents vs irrelevant
Fraudulent credit card transactions vs normal
Customers that subscribed following the offer vs other customers
We can look at what kind of mistakes are made in predictions
This is for binary classification, but a confusion matrix can be drawn for multi-class classification (for 3 classes - 3 rows and 3 columns)
Precision, recall, F-score
Precision
The fraction of true positives in all examples reported as positives
Precision = True Positive / (True Positive + False Positive)
Recall
The fraction of true positives in all examples that were actually positives in the testing set
Recall = True Positive / (True Positive + False Negative)
F-score
Combines the two
F-score = (2 * Recall * Precision) / (Recall + Precision)
e.g. airport security checks
Precision
How many out of those checked were carrying a gun
Recall
How many out of those who were actually carrying a gun were caught
Good: Precision, Recall, F-score is close to 1
Bad: Precision, Recall, F-score is close to 0
Precision = 2/(2+2)=50%
Recall = 2/(2+1)=66%
F-score = 2 * 0.5 * 0.66 / (0.5+0.66)=0.4
Cost sensitive evaluation
Different types of mistakes may be assigned different costs
e.g.
Classifying spam as ham is not as bad as classifying a relevant email as spam
Spam as ham costs £1
Ham as spam costs £5
Classification error = (10+20)/100 = 30%
Error cost = 10*£1 + 20*£5 = £110
Classification error = (25+5)/100 = 30%
Error cost = 25*£1 + 5*£5 = £50
Evaluation of regression
Mean absolute error (MAE)
MAE = sum(absolute(errors)) / number of predictions
MAE = (|-9|+|6|+|1|+|-2|+|-7|) / 5 = 5.0
The lower the MAE the better
Qualitative evaluation
Interpretability of predictions
White box interpretable vs black box models
White box models (explainable predictions)
e.g.
Decision trees
Rules
Black box models (model prediction is hard to explain)
e.g.
Neural networks
Support vector machines
Robustness and stability
Does the model continuously perform well?
Or sometimes performs very well, but from time to time is extremely wrong?
Business Perspective
What predictions are going to be used for?
e.g.
Predicting very certainly for a small number of customers vs predicting less certainly for more
Predicting the direction of change in exchange rate correctly vs absolute error