Please enable JavaScript.
Coggle requires JavaScript to display documents.
NLP (Methods (Stop Words Removal
(pre-definied list)
(stop word are…
NLP
Methods
Tokenization
- :no_entry: remove espaço (san francisco, new york)
- :no_entry: remove pontuação (problema dataset bio medico)
-
Word normalization
Lemmatization
:check: running”, “runs” and “ran” => “run”
:no_entry: Caring => car
-
-
Stemming
-
-
:no_entry:affixes can create or expand new forms of the same word (called inflectional affixes), or
even create new words themselves
-
Predicting Parts of Speech for Each Token
(verb, adverb, noum, etc.)
-
-
-
Algoritmos
Bag of Words
:no_entry: some words are not weighted accordingly
:no_entry: ausência de significado de semântica
:no_entry: Ausência de contexto
:no_entry: stop words add noisy
:fire: TFIDF
Term Frequency - Inverse Documento Frequence
- Wij = Fij * log (N / Dij)
- Não precisa remover STOPWORD
- Só precisa remover a pontuação e colocar em minúscula
Topic modeling
(each document consists of a mixture of topics
and that each topic consists of a set of words)
-
Explicit Semantic Analysis (ESA)
(how similar in meaning two words
or pieces of text are to each other)
-
Classifiers
-
-
-
:pencil2:Classifier Evaluation
(F-Core): compare 2 diferente classifier
F= 2pr / p+r
p: precision = correctly classified examples / total number of classified examplest
r: recal = number of correctly classified examples /
the actual number of examples in the training set
-
-
-