Chapter 10. Data Transformations (10.1.1 IF-THEN statements and …
Chapter 10. Data Transformations
10.1.1 IF-THEN statements and One-hot encoding
IF-THEN statements are among a data scientist’s best friends during data transformations.
They allow for the examination of a value in a column and the ability to make changes to this value or other values elsewhere in the dataset.
IF-THEN statements allow you to create content in a new column depending on what exists in one or more other columns.
One-hot encoding is the conversion of a categorical column containing two or more possible values into discreet columns representing each value.
An argument can also be made that the model should only use the training data to impute so that this number will be ready for coding new values once the system is put into production.
Often, the comparison of two columns can create a new column that provides an additional predictive ability for machine learning algorithms.
Operators Applied to Two Columns.
Less than or equal
Greater than or equal
When two data points are not the same, this can have an effect on a prediction. (!=)
If two data points are the same, they may cancel each other out in some cases, or they 109 may indicate a higher likelihood of a target phenomenon occurring.(==)
When dealing with financial data, the current
interest of a continuously compounded loan or
bond may be of interest. To represent P = C
e^(rt) in data transformations, C
be used to capture the exponential relationship
between e and (r
Operators Applied to One Column.
Works for a different distribution of data. Compare ability to linearize data vs log.
Log transformations are generally used to
linearize exponential data.