Please enable JavaScript.
Coggle requires JavaScript to display documents.
Introduction to Predictive Modeling: From Correlation to Supervised…
Introduction to Predictive Modeling: From Correlation to Supervised Segmentation
Models, Induction, & Prediction
Simplified representation of reality that serves a purpose
Ex: map physical world model
Predictive model formula for estimating
the unknown value (target)
Gain Insight into underlying process
May be judged solely on performance
Supervised learning
Describes set of selected variables & predefined variable, target variable
Estimates value of target as function
Instance or example represents a data point
Sometimes called feature vector
Model Induction
Creation of models from data
Induction algorithm, procedure that creates the model
Training data, input data for IA
Supervised Segmentation
(Different) Target Variable
Segment population into subgroups
Ex: middle-aged professionals who reside in NYC
How to judge if variable contains important info about TV?
Rank variables by how good at predicting value of target
Selecting Informative Attributes
Binary stick person example
Want resulting groups to be pure, homogenous, with respect to TV
If one is off, entire group impure
Rarely find pure data
Make it as pure as possible
Attributes rarely split a group perfectly
Not all attributes binary
Many have 3+ distinct values
Some attributes take on numeric values
Formula evaluates how well attribute splits segments
Formula based on purity measure
1 more item...
Attribute Selection w/ IG
Rank by IG to simplify
Mushroom Example
Target Var. - edible
Values - yes (edible) no (poisonous)
1st calculate entropy
0.96 entire entropy
With Tree-Structured Models
How to merge highest IG attributes?
Segments of data take form of a tree
Classification trees used as predictive models
Nonleaf nodes referred to "decision nodes"
Provide model that represents sort of supervised segments we want
Divide-and-conquer approach
Visualizing Segmentations
Only possible to visualize 2-3 dimensions
Trees as set of rules
Trace down single path from root node to leaf collecting conditions as we go
Consists of attribute tests along the path connected with AND
Ex: If (Balance < 50k) AND (Age < 50) THEN Class-Write-off
Example of addressing the churn problem with tree induction
Probability Estimation
Prefer rather than predictions
Can use in a more sophisticated decision-making process
Want each segment (leaf of tree) an assigned probability
Frequency-based estimate
Could lead to overfitting
Binary class probability
p(c) = (n+1) / (n+ m + 2)
N= # of examples, c= class, m= number of examples not belonging to class c