Please enable JavaScript.
Coggle requires JavaScript to display documents.
AI (DL (NLP (word2vec app (algorithms (direct prediction (skip gram (loss…
AI
DL
NLP
-
-
word2vec app
representation
-
distributional similarity(neighbour, context)
-
algorithms
direct prediction
skip gram
-
probability softmax form of two vector set(context and center, easy for math), given center, predict prob for context, over all vocab
softmax
-
use exponential is making bigger things dominate, close to max, but still not one-hot, thus we call it soft max
d is vector dim, V is vocab size, param size:2dV
-
probelm
update vector so sparse, only contains word in the windoe
can use sparse techniques, only record non-zero updates
calculation of denominator of probability(for SGD, actually is the calculating of expected outside word) is needing all words in vocab
Negative Sampling:
use objective function: log of sigmoid u_o dot v_c plus sum of random selected (10)vocab words neg sigmoid word dot v_c
- 1 more item...
-
count based
co-occurence matrix
-
-
-
hacks
make too frequent words have limited count, say 100
-
-
-
-
-
use cases
-
but for other tasks such as sentiment analysis, this would not be that good
-
dependency parsing
dependency structure
-
you can refer preposition to noun a few steps backward, but it will most probably in nested structure
-
-
source of information
-
-
intervening material(verb,punctuation)
-
dependency parsing
-
-
if have crossing lines(the linear order affect this), then it is non-projective
-
tensor flow
flow graph
node(operations with any inputs,outputs)
-
-
-
-
code
lazy evaluation
-
-
prediction->loss function, label ground truth as another placeholder
optimizer,minize(loss function): backpropagation, our session.run is on it
-
every time we run a loop, there will be one update to W,b
RNN and language model
-
RNN
h_t-1(W_hh) , xt(W_hx)(non-linear) -> ht (softmax, W_S)->y^t
-
cost function is still cross entropy but classes is what is the next word, and actually this is unsupervised ML
for overall loss function we use 2^cross entropy, which is perplexity
-
-
-
bidirectional RNN
we have two h_t, on updated from h_t-1, another updated from h_t+1, and the y^ is calculated from concatenation of two h_t
-
-
-
-
-
-