Chapter 3: Predictive Modeling

identifying informative attributes

Segmenting data by progressive attribute selection

Typically something we do not want to occur

Information reduces uncertainty

Douglas Beighle

Model: a simplified version of reality

Predictive model: a formula for estimating the target variable

Classification models

regression models

Supervised Learning model creation occurs to find a relationship between a set of variables and predefined variable. " target variable.
🔥

The fundamental concept: how do we know if a variable contains important information about the target variable

Entropy: a purity measure that measures disorder to a dataset

you want to reduce entropy

for numeric variables, variance mesures impurity

Perfectly even distribution of variables gives the dataset an entropy of 1.

Entopy shows how much information gain is created from a dataset

Tree structured models

multiple attribut attribute selection

each leaf contains a variable for the target variable

each leaf contains a segment classification

Leafs should be homogenous

Attributes/ target variable

Trees can also create a set of rules. If/then statements

probability rather then a definitive yes/no

quality of the variables individually

Highest information gain feature (HOUSE)

root of the tree