Please enable JavaScript.
Coggle requires JavaScript to display documents.
Larson Chapter 10 (Creating new features (i.e. columns) (Text (extract new…
Larson Chapter 10
Creating new features (i.e. columns)
Text
extract new columns with text
may require several transformations
Categorical
consolidate categories
Numerical
apply mathematical equations to columns
Splitting and Extracting New Columns
IF-THEN statements
used to examine existing column and generate a change for the value in a new column
called IF in excel and CASE in SQL
one-hot encoding
splitting one column containing categorical data with more than one outcome into individual columns
a unique column for each outcome with binary values
aka "dummy encoding"
used because not all ML tools can handle multi-categorical data
EXAMPLE
employee sales split up into binaries for each employee
improves predictive ability
further improves if you included characteristics
age
personality factors
geographic background
could potentially be used to pair employees with specific clients