Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 10: Data Transformations Jacob Harvey (Types of Transformations,…
Chapter 10: Data Transformations
Jacob Harvey
Types of Transformations
Text: Extract new columns from columns containing text
Categorical: Combine numerous categories into fewer categories, turn multi-categorical columns into several binary columns
Numerical: Add, subtract, multiply, etc. two or more columns to create new columns.
IF-THEN Statements:
Allow for the examination of a value in a column and the ability to make changes to this value or other values elsewhere in the dataset.
Excel = IF
SQL = CASE
IF
EmployeeID
= 1 THEN Emp_1 = 1 ELSE Emp_1 = 0 ENDIF
They allow you to create new content depending on what exists in one or more other columns
One-Hot Encoding: The conversion of a categorical column containing two or more possible values into discreet columns representing each value.
This approach is used because some machine-leaning tools are not able to properly analyze the content of multi-categorical features.
If you want to understand which employees performed highest in the past, use employee IDs
If you want to create a system that is designed to predict future performance, it would make more sense to include characteristics of your employees e.g.(height, age, gender, age etc.)
Transformations One Column at a Time:
Natural Logarithm
: Log(): Used to linearize exponential data
Square Root:
Sqrt(): Works for a different distribution of data, linearize data vs log.
Square:
Square(): Makes large values even larger.
Operators Applied to Tow Columns:
Addition (+), Subtraction (-), Absolute Abs(),
Division (/), Less Than (<), Less Than or Equal (<=) Greater Than (>), Greater Than or Equal (>=), Not Equal (!=), Equal (==), Eponentiation (
*), Multiplication (
),