Please enable JavaScript.
Coggle requires JavaScript to display documents.
Natural Language Processing, - Coggle Diagram
Natural Language Processing
Represent meaning
One hot encoding
Huge Vector
No natural notion of similarity
Distributional Semantic
A word's meaning is given by the words that frequently appear close-by
Word Embedding (Word Vector)
Build dense vector for each word -> Similar to vectors of words appear in similar context
Word2Vec
: framework to learning word vector
For each position t: predict context words within a window of fixed size m,
given center word w_t
Skip Gram
Train with Negative Sampling: train binary logistic regression (sigmoid)
True pair: center + word in context
Noise pair: center + random word
Maximize probability of 2 words co-occurring +
Minimize probability of noise words
P(w_t + i | w_t; theta)
P(context | center) = Softmax(u_context, v_center)
Simple but expensive to train
2 vectors for each word
v: when center
u: when context
Theta: Vector of u and v of all words
Count Based
Build co-occurrence matrix
Use window around each word to captures some syntactic and semantic information
Vector size increase with Vocabulary
High dimensionality and sparse -> less robust model
Need improvements
Singular Value Decomposition to help reduce dim
Scale counts
log(freq)
min(count, t=100)
Ignore function words
Use Pearson correlations instead of counts
GLoVE
Ratio of co-occurrence probabilities
can encode meaning components
Log bilinear model: w_i . w_j = log(P(i | j))
with vector differences: w_x(w_a - w_b) = log ( P(x | a) / P(x | b) )
Advantages
Good performance even with small corpus and small vectors
Fast training
Scalable to huge corpora
WordNet
Synonym sets
Hypernyms ("is a" relation)
Dependency Structure:
which words depend on which other words
(modify, attach to, arguments of)
Context Free Grammar
Get in a sequence of words
Need to figure out the structure behind -> Get meaning
Syntactic structure consists of
relations
between lexical items
-> form a tree
-> Treebank
Dependency Parsing
Parse a sentence by choosing for each word what other word it is a dependent of
Dynamic Programming
Graph Algorithm
Compute score for every possible dependency
Repeat for each word -> Build Minimum spanning tree
Constraint Satisfaction
Transition-based parsing
Greedy Transition-based parsing
Shift / Left-Arc / Right-Arc from Stack + Buffer
TraIn a Classifier to predict next action
Linear time
Softmax linear separation -> not strong enough
Use a Neural Classifier more complex, non-linear
Neural Dependency Parser