Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 10: Data Transformations (Transforms (Addition = +, Subtraction = …
Chapter 10: Data Transformations
IF-Then statements and One-hot encoding
If-Then Statement allows...
Examination of a value in a column
Ability to make changes to column value or other values in dataset
Note: All tools in this branch have IF-Then capabilities
One-hot encoding
Is the conversion of a categorical column containing two or more possible values into discreet columns representing each value
Ex: It can be known as dummy encoding as it will use binary to predict true or false. This is because some machine learning tools cannot properly analyze multi-categorical features.
Predictive ability improves when using one-hot encoding, especially when analyzing employee IDs
If a company includes more data than just ID, then they can improve sales by matching employees with like minded consumers
Regular Expressions (RegEx)
Used to find, replace, or extract content from a text
Powerful
Flexible
Ever use ctrl+f in Word? That was a RegEx statement.
Extremely helpful in text data that has a predictable format.
Emails use the @ symbol, thus making it easy to find
Phone numbers use -, thus making it easy to find
area codes only have 5 numbers, thus being easier to find
Requires that commands be general
use \d returns any digit character, \D does opposite
\w returns any alphanumeric character, \W returns non-alphanumeric characters
(.) will return any character
(?) means the preceding character is not necessary for match. EX: colou?r will bring back both color and colour
\s will analyze whitespace (spaces, tab, new line, etc) and \S does the opposite
Square bracket characters [] allow for a list of allowable characters. A - lets one specify a range of values as well
| is the or command and by putting in between two values, one will yield both results
^ indicates the beginning of a string when not in square brackets
(*) declares that the RegEx pattern preceding can be matched zero to an infinite amount of times. Allows for pattern to be skipped entirely.
(+) specifies that a pattern must occur at least once in order to qualify
{} specify how many times a pattern can appear with upper and lower bounds
Primary goal of RegEx is to extract instances of a specified pattern in a string, thus allowing new binary columns.
Transforms
Addition = +
Subtraction = -
Absolute = abs()
Multiplication = *
Division = /
Less than = <
Less than or equal = <=
Greater than = >
Greater than or equal = >=
Not equal = !=
Equal = ==
Exponentiation = **
Natural Logarithm = Log()
Square root = Sqrt()
Square = Square()
Steven Chesney
Email:
stch1109@colorado.edu
Class: 3201-002
Prof: Larsen