Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 10.3:Transforms (Operators Applied to Two Columns (Equal: = = (If…
Chapter 10.3:Transforms
Operators Applied to Two Columns
Not Equal: !=
When 2 data points are not the same, this can have an effect on a prediction
EX: the vibration of a machine if different during operation from its vibration the day before
Equal: = =
If 2 data points are the same, they may cancel each other out in some cases, or they may indicate a higher likelihood of a target phenomenon occuring
Or, they may indicate a higher likelihood of a target phenomenon occurring
Greater Than or Equal: > =
EX: if # of available seats in a car is greater than or equal to family size, the likelihood of purchasing a new van may be lower
Exponentiation: **
When dealing with financial data, the current interest of a continuously compounded loan or bond may be of interest
To represent P = Ce^(rt) in data transformations
C
e**(r
t) can be used to capture the exponential relationship between e and (r*t)
Greater than: >
EX: if family unit larger than # of seats in largest car owned, summer camping trips to national parks become less likely
Less Than or Equal: < =
EX: if # of bedrooms in a family's house is smaller than or equal to the family size one year after birth of new child, this feature may be predictive of the purchase of new bunk beds
Less Than: <
EX: if # of seats available in a family's largest car is smaller than their family size after the birth of a new child, this feature may be a predictive of purchase of a new car
Divison: /
Dividing one column by another, sometimes info that is otherwise hidden from some algorithms can be made available
EX: income divided by # of kids might reveal aspects of available funds of a family
Multiplication: *
Sometimes: Two related columns interact w/ a target in a way that can only be detected through their product
EX: having a bad interaction with customer service doesn't mean that they will cancel their relationship (churn). But, if the person always has a bad temper, a bad interaction is more likely to result in churn
Interaction effect = moderated relationship between column and target
Moderation comes from the size of another feature
Absolute: Abs()
Similar to subtraction: used in cases where actual distance between two numbers rather than whether its negative or positive is of importance
EX: Carnival people trying to guess your weight
Subtraction: -
Subtracting one column from another, the similarity or difference between becomes more apparent
The closer the number is to zero, the more similar
EX: to predict if someone is likely to engage in outside activities -- subtracting their preferred inside temp from current outside temp can be an indicator. The closer to zero the more likely they will go outside
Only care about positive, negative, how close to zero
Addition: +
Adding different columns, predictive signals can be increased
EX: adding person's # of spouses (usually one) w/ # of children and adding one for the focal person, resulting number is family size, could be helpful to understand person's behavior in certain situations (such as not being able to afford flights with a big family)
Operators Applied to One Column
Square Root: Sqrt()
Similar to log transformations
Works for different distribution of data
Compare ability to linearize data v. log
Square: Square()
Makes large values even larger
EX: if a column you are working on contains the standard deviation of an engine's vibrations, you might square it to find the variance
Natural Logarithm: Log()
Generally used to linearize exponential data
EX: the higher a family's total income, the less likely they may visit national parks because they could afford other experiences (personal interests still need to be taken into account)