Please enable JavaScript.
Coggle requires JavaScript to display documents.
Data Transformations (Transformations (Subtracting (-) (Closer to 0 (More…
Data Transformations
Transformations
Addition (+)
Add columns
Can increase predictive signals
Subtracting (-)
Subtracting one column from other
Similarly or difference
Closer to 0
More similar
Absolute Abs ()
Similar to subtraction
Actual distance between
2 numbers
Multiplication (*)
2 related columns interact
Target detected through product
Moderated relationship
Size of another feature
Division (/)
Dividing one column by another
Hidden info by algorithm
Made available
Less than (<)
Seats available in families larger car < family size
Possibly buy new car in future
Less than or equal (<=)
Bedrooms in house <= to family size one year after birth of baby
Predictive purchase of bunk beds
Greater than (>)
Family unit > number seats in largest car owned by family
Less likely to go on summer camping vacation to national parks
Greater than or equal (>=)
Number of avaialble seats in car is >= family size
Less likely to purchase new van
Not equal (!=)
Two data points not the same
Equal (==)
May cancel each other our
Or increase likelihood of target phenomenon
Exponentiation (**)
Financial data
Example
Current interest of compounded loan
P=Ce^(rt)
C
e**Cr
t
Numerical
Add
Subtract
Multiply
Two or more columns
To create new columns
One Column Transformations
Natural logarithm log ()
Linearize exponential data
Increaese in family income
Decrease in going to national parks
Can afford other experiences
Square root sqrt ()
Similar to log transformation
Compare ability to linearize data vs log
Square
Makes larger values larger
Example
Std. deviation of engines vibrations
Square
Find variance
Splitting and Extracting
IF THEN
Data transformation
Examination of a value
In a column
Make changes to values
EXCEL
Formula= IF
SQL
Formula= Case
Example
One-hot encoding
Splitting column data
Used because some ML tools
Cannot analyze multi categorical
Conversion
Categorical column with 2 or more possible values
To discreet columns
Text
Extract columns
From columns
Containing text
Categorical
Combine multiple categories
Into fewer