Please enable JavaScript.
Coggle requires JavaScript to display documents.
Natural Language Processing Big Picture - Coggle Diagram
Natural Language Processing
Big Picture
big concept
how
NLP
works
tokenize text
splits sentences/text into words
word = token
e.g
Mr.
Jason
goes
to the
beach
pattern matching
finding patterns in a tokenized text
implementation of
Finite State Machine (FSM)
Common NLP Models
supervised sequential model
what is it
scans token in sequence
in order
models example
Hidden Markov Model (HMM)
Conditional Random FIelds
Maximum Entropy (MaxEnt)
Deep Learning (DL)
supervised non-sequential model
what is
scans token not in sequence
models example
Support Vector Machines (SVM)
Decision Trees (DT) / Random Forests (RF)
Naive Bays
Ambiguities in NLP
Lexical Ambiguity
what is it
token that has different meanings in different context
example
"bank"
Syntatic Ambiguity
what is it
token that has different meanings in different context
the context is difference in sentence strucutre
example
"Did you find the answer using google?"
"Did you find the answer that has google in it?"
Semantic Ambiguity
what is it
token that has different meanings because of big abstraction
the context is difference in abstraction, a big abstraction
example
"come join us in a cup of coffee"
main process of NLP
tokenization
by word
advantages
the most common
finding relations between informations is easier
better performance (time-wise)
disadvantage
Out of Vocabulary Words
(OOV)
running != run
the testing dataset does not exist in training dataset
solution:
by sub-word / by character
by sub-word
the best
uses
stemming
/
lemmatization
by character
advantages
no
OOV
disadvantages
hard to find relations between informations
solution:
by sub-word
stemming
what is it
plainly chopping words into subwords
chops end/beginning of words (suffix/prefix)
advantage
performance (time-wise)
disadvantage
"riding" -> "rid"/"ridi"
chopped words could have no meaning at all
solution:
lemmatization
lematization
what is it
just like
stemming
chopped words must have meaning
advantage
disadvantage
performance
POS tags
what is it
classifying words in a sentence to POS
subject, verb, obect, etc
why we need it
better understanding by the machine learning
avoids missunderstanding
"I love you honey" and "Honey let's make love" is different
how to implement
Hidden Markov Model
named entity recognition
chunking