Please enable JavaScript.
Coggle requires JavaScript to display documents.
10 Data Transformations (examples` (0-9 (exact digits), a-z (exact sets of…
10 Data Transformations
10.1 Splitting and extracting new columns
if then and one hot
if then
we know what if thens are
One hot encoding
catergorical to binary
leads to improved predictive skills
10.1.2 regular expressions
best investment
finds
replaces
transforms
extracts
used to specify the context for a pattern
simple version is cntrl f in word
any text with a predictible format can be etracted
emails are predictable
needs general commands
/ means special command
1 more item...
text
new colums within columns
categorical
numerous categories into fewer
Numerical
numerical
add subtract , multiply two or more columns to create new columns
simple version is cntrl f
types of reg x
[]
one character in a specific position can be any charcer listed inside
^
characrers listed cannot exist
|
pattern on left or the right is acceptable
^ without bracket
specifies the beg of the string being examined
*
allows characters to be skipped
+
specifies how many of the one character there must be
ab+
beg with a
must be followed by at least one b
{}
a{3}
aaa
a{2,4}
pattern between two and four a's
examples`
0-9
exact digits
a-z
exact sets of characrters same for AZ
/d
any one digit
/D
anything besides a digit
/w
any alphanumeric character
/W
non alphanumeric
.
any character
?
character following ? is optional
/s
whitespace
/S
anything but whitespace
[]
any alphanumeric within[]
[^]
any alphanumeric character not listed after the ^
|
pattern on either side of the pipe
^
any pattern must begin from start of string being examined
$
any pattern must be ajacent to the end of string being examined
*
zero or more repititions
+
one or more reps
{m}
pattern preceding curly brackets should repat "m" times
{m,n}
pattern repeat "m" to "n" times
()
capture strings that fit the pattern
operators tied to two collumns
Transformations
+
increases predictive signals
-
used to show the similarity or difference between collumns
Abs()
actual difference between numbers and not + or Neg
*
/
reveals hidden variables
log ()
lineraize exponetial data
ex
higher a familys income makes more things availbe but may stop increasing as income goes up
sqrt()
square()
makes large values even larger