Please enable JavaScript.
Coggle requires JavaScript to display documents.
Provost 10 (N-Gram Sequences (Bag of words treats each individiual word as…
Provost 10
-
Text
-
Difficult because it is unstructured data (not numeric), but does have linguistic structure but tht only works with humans and not computers
Representation
-
Bag of words- treat every document as a collection of individual words. Ignored grammar, striaghtforward and easy to generate and usually works for most tasks
Term frequency- word couts. Helps differentiate between how many times a word is used in some applicatiosn
-
-
-