Please enable JavaScript.
Coggle requires JavaScript to display documents.
Provost Chapter 10 (basic steps to transform text (bag of words (treat…
Provost Chapter 10
basic steps to transform text
bag of words
treat documents as a collection of words
term frequency
how frequently is a word used
combining them
TFIDF
named entity extraction
topic models
data preparation
text data
text processing
why is it important?
text is everywhere
why its difficult
unstructured data
doesnt have a sorted structure
varying length and text fields
word order sometimes matters
dirty b/c people are bad at writing