Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 3 (Supervised segmentation (Segment groups that differ from each…
Chapter 3
Supervised segmentation
Segment groups that differ from each other
So that we can predict or estimate something
e.g. which customers will leave or respond to an ad
Informative variables are attributes
Information is a quantity that reduces uncertainty
Attributes correlate with the value target
Model
A simplified representation of reality to serve a purpose
data science predictive model: estimates the target
Prediction: estimating an unknown value
Supervised learning
describes relationship between selected variables and target
instance (row): a fact or data point
Descriptive modeling: gain insight on underlying process
Many names for same things
Dataset= Table=Spreadsheet
Examples or instances= Rows
Independent Variable= Exploratory Variable
Input data used in induction is known as training data
A.K.A labeled date because target variable is known
Attribute Selection with informative gain
Example: Are mushrooms Edible or poisonous?
Dataset is slightly unbalanced: Entropy is .96
Want to reduce Entropy or the shaded area
Odor reduces Entropy by .1 so it is an informative attribute
Informative Attributes
Want resulting groups to be pure
Pure: Homogeneous to target variable
Attributes rarely split groups perfectly
Some attributes are non-binary and some are numeric
Can evaluate using purity measure
Most common splitting criterion is Informative Gain
Based on Purity Measure: Entropy
Entropy: A measure of disorder applied to a set
Disorder corresponds to how mixed the segment is in regards to target variable
Supervised Segmentation with Tree-Structured Models
Segmentation of Data is like a tree
Root is the top
Made up of interior and terminal nodes
Each Node contains an a distinct value of attribute
Each "branch" ends in a terminal node
Each leaf is a segment
Tree induction is easy to undertand
Included in most data mining packages
Takes a divide and conquer approach
Start with whole data set
Then apply variable selection to create purest subgroups
Visualizing Segmentation
Lines separating regions are Decision surfaces
Each node of the classification tree tests a variable against fixed
Probability Estimation
Want models to predict more than classification.
Might want to see the probabilities
Problems with Probabilities
Some leaves might have 100% probability due to single instances
This is a case of overfitting
Churn Example
Start by measuring informative gain of each variable
Place highest informative variable as root
The other nodes are important because they rely on each other