Please enable JavaScript.
Coggle requires JavaScript to display documents.
Provost Chapter 10 (Difficulty behind text (Unstructured data, uses…
Provost Chapter 10
Difficulty behind text
-
-
-
different words matter, while others are useless
-
Text, just another form of data
-
-
Internet claims to be new media, but uses same text forms as old media
important for businesses, because this is how customers communicate
N Grams- easy to generate, require no linguistic knowledge, or parsing algorythm
-
-
useful when a phrase is signifigant, but individual words are not
-
-
-
Topic Models-refers directly to words, instead of documents
creates a set of topics in corpus seperately, then maps the words to each topic designated
-
-
-