Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 10. Data Transformations (10.1.1 IF-THEN statements and …
Chapter 10. Data Transformations
10.1.1 IF-THEN statements and One-hot encoding
IF-THEN statements are among a data scientist’s best friends during data transformations.
They allow for the examination of a value in a column and the ability to make changes to this value or other values elsewhere in the dataset.
IF-THEN statements allow you to create content in a new column depending on what exists in one or more other columns.
One-hot encoding is the conversion of a categorical column containing two or more possible values into discreet columns representing each value.
An argument can also be made that the model should only use the training data to impute so that this number will be ready for coding new values once the system is put into production.
10.2. Transformations
Often, the comparison of two columns can create a new column that provides an additional predictive ability for machine learning algorithms.
Operators Applied to Two Columns.
Addition
subtraction
Absolute
Multiplication
Division
Less than
Less than or equal
Greater than
Greater than or equal
Not equal
When two data points are not the same, this can have an effect on a prediction. (!=)
Equal
If two data points are the same, they may cancel each other out in some cases, or they 109 may indicate a higher likelihood of a target phenomenon occurring.(==)
Exponentiation
When dealing with financial data, the current
interest of a continuously compounded loan or
bond may be of interest. To represent P = C
e^(rt) in data transformations, C
e**(r
t) can
be used to capture the exponential relationship
between e and (r
t). (*
)
Operators Applied to One Column.
Square root
Works for a different distribution of data. Compare ability to linearize data vs log.
Square
Natural logarithm
Log transformations are generally used to
linearize exponential data.