Please enable JavaScript.
Coggle requires JavaScript to display documents.
Transformations (Operations Applied to Two Columns (Less Than < (If…
Transformations
Operations Applied to Two Columns
Less Than <
If the number of seats available in a family’s largest car is smaller than their family size after the birth of a new child, this feature may be predictive of the purchase of new car.
Less Than or Equal <=
If the number of bedrooms in a family’s home is smaller than or equal to the family size one year after the birth of a new child, this feature may be predictive of the purchase of new bunk beds.
Division /
By dividing one column by another, sometimes information that is otherwise hidden from some types of algorithms can be made available.
Greater Than >
If the family unit is larger than the number of seats in the largest car owned by the family, fun, summer camping vacations to national parks become less likely.
Multiplication *
Sometimes two related columns interact with a target in a way that is only detectable through their product. This interaction effect is often called a moderated relationship between a column and the target. The moderation comes from the size of another feature
Greater Than or Equal >=
If the number of available seats in a car is greater than or equal to the family size, the likelihood of purchasing a new van may be lower.
Absolute Abs()
Similar to subtraction but uses actual distance between two numbers rather than whether it is negative or positive of importance
Not Equal !=
When two data points are not the same, this can have an effect on a prediction.
Subtraction -
Subtracting one column from another the similarity or difference between them becomes more apparent. The closer to 0 the more similar
Equal ==
If two data points are the same, they may cancel each other out in some cases, or they may indicate a higher likelihood of a target phenomenon occurring.
Addition +
Adding columns predictive signals can be increased
Exponentiation **
When dealing with financial data, the current interest of a continuously compounded loan or bond may be of interest. To represent P = C e^(rt) in data transformations, C
e**(r
t) can be used to capture the exponential relationship between e and (r*t)
Operators Applied to One Column
Square Root Sqrt()
Works for a different distribution of data. Compare ability to
linearize data vs log.
Square Square()
Makes large values even larger.
Natural Logarithm Log()
Generally used to linearize exponential data