Please enable JavaScript.
Coggle requires JavaScript to display documents.
[PPDM] Text - Coggle Diagram
[PPDM] Text
Preprocessing
parsing
getting the text
from documents, etc
selects
interesting part
header
paragraph, etc
lexical analysis
a.k.a tokenization
stop-word removal
phrase detection
stemming & lemmatization
weighting
Bag-of-Words Representation
what is it
frequency of words
how many times a word appears in a document
why is it important
represents text in numerical order
Advanced Bag-of-Words
TFIDF
word2vec