Please enable JavaScript.
Coggle requires JavaScript to display documents.
Provost: Ch 3 Intro to Predictive Modeling (Models, Induction, and…
Provost: Ch 3 Intro to Predictive Modeling
Models, Induction, and Prediction
Predictive model = formula for estimating the unknown value of interest (the target)
Model = representation of reality created to serve a purpose
Induction = procedure that creates the model from the data
Input data is called training data
Supervised Segmentation
Select informative attributes
Want the resulting groups to be pure (homogeneous with respect to target variable)
Real data rarely ends with pure segments
Attributes rarely split a group perfectly
Not all attributes are binary
Some attributes take on numeric values
Entropy = measure of disorder that can be applied to a set (p1log(p1) - p2log(p2)....)
Information gain = change in entropy due to any amount of new information being added
Regression - measure impurity with variance not entropy
Tree-Structured Models:
Decision tree
Node: test of an attribute
Leaf: terminal node that corresponds to a segment
Rule = IF...AND...THEN... = Leaf
Probability Estimation Tree
Leaf: Probability of membership in this segment
Frequency based estimate: likelihood of one instance being the case compared to the instances in the data