Please enable JavaScript.
Coggle requires JavaScript to display documents.
Regular Expressions (RegEx) (Commands (555 (Any set of numbers may be…
Regular Expressions (RegEx)
Powerful and flexible language for finding, replacing, or extracting content from text.
Used to specify the context in which a pattern exists
Any text that has a predictable format to it can be extracted
Primary goal of using RegEx for data science is to extract instances of a specified patter in a string
allowing for the creation of a new column containing either a TRUE or a FALSE value
Commands
555
Any set of numbers may be specified
abc
any combination of letters can be specified
\d
any digit will do
\D
find any character expect for digits
\w
accepts any alphanumeric character (0-9,a-Z
\W
find words like &*%$
(.)
will return any character
(?)
means character directly preceding it is not necessary for a match
\s
a space, tab, new line, and carriage return qualify
\S
anything but whitespace
[]
allow for specification of a list of allowable characters
(^)
Specify anything but the alphanumeric values listed
"|"
putting this between 2 diff patterns, either pattern will be selected
(^)
indicates the beginning of the string
(*)
declares that the regex pattern preceding it can be matched zero to an infinite number of times
allows for a regex pattern to be skipped entirely if it does not exist, or to be matched many times
\w*
match a string of letter characters a-z of any length
(+)
change lower bound from zero to one, specifies that the pattern must occur at least once
"{}" specify exactly how many times a pattern can appear with the possibility of including lower and upper bounds
()
capture the strings that fit the pattern
commands
Requires that commands be general
Escape Character
indicate that the next character in regex patter should be interpreted as a special command
Captures
the explicit notation of any substring matching a regex pattern that can be used to create a new column for use in ML
Repetition
Occurrences in which it makes sense to specify that a pattern repeats a certain number of times within a string being analyzed