Please enable JavaScript.
Coggle requires JavaScript to display documents.
NLP (Methods (Stop Words Removal
(pre-definied list)
(stop word are…
- :no_entry: remove espaço (san francisco, new york)
- :no_entry: remove pontuação (problema dataset bio medico)
Word normalization
:check: running”, “runs” and “ran” => “run”
:no_entry: Caring => car
:no_entry:affixes can create or expand new forms of the same word (called inflectional affixes), or
even create new words themselves
Predicting Parts of Speech for Each Token
(verb, adverb, noum, etc.)
Bag of Words
:no_entry: some words are not weighted accordingly
:no_entry: ausência de significado de semântica
:no_entry: Ausência de contexto
:no_entry: stop words add noisy
:fire: TFIDF
Term Frequency - Inverse Documento Frequence
- Wij = Fij * log (N / Dij)
- Não precisa remover STOPWORD
- Só precisa remover a pontuação e colocar em minúscula
Topic modeling
(each document consists of a mixture of topics
and that each topic consists of a set of words)
Explicit Semantic Analysis (ESA)
(how similar in meaning two words
or pieces of text are to each other)
:pencil2:Classifier Evaluation
(F-Core): compare 2 diferente classifier
F= 2pr / p+r
p: precision = correctly classified examples / total number of classified examplest
r: recal = number of correctly classified examples /
the actual number of examples in the training set