Chapter 3: Predictive Modeling
identifying informative attributes
Segmenting data by progressive attribute selection
Typically something we do not want to occur
Information reduces uncertainty
Douglas Beighle
Model: a simplified version of reality
Predictive model: a formula for estimating the target variable
Classification models
regression models
Supervised Learning model creation occurs to find a relationship between a set of variables and predefined variable. " target variable.
🔥
The fundamental concept: how do we know if a variable contains important information about the target variable
Entropy: a purity measure that measures disorder to a dataset
you want to reduce entropy
for numeric variables, variance mesures impurity
Perfectly even distribution of variables gives the dataset an entropy of 1.
Entopy shows how much information gain is created from a dataset
Tree structured models
multiple attribut attribute selection
each leaf contains a variable for the target variable
each leaf contains a segment classification
Leafs should be homogenous
Attributes/ target variable
Trees can also create a set of rules. If/then statements
probability rather then a definitive yes/no
quality of the variables individually
Highest information gain feature (HOUSE)
root of the tree