Please enable JavaScript.
Coggle requires JavaScript to display documents.
DL(18) Transformer - Coggle Diagram
DL(18) Transformer
-
-
Self Attention
-
3) and 4) divide the score by quare root of d(K), then pass the result through a softmax operation
-
-
-
-
Multi-Head Attention
-
ex: one head focusing "the animal", one head focusing "tired"
-
Decoding
-
-
-
Kencdec, Vencdec, helps the decoder focus on appropriate places in the input sequence
to prevent leftward information flow in the decoder, all values correspond to illegal connection are masked out -infinity
Linear and Softmax
decoder stack outputs a vector of floats, these layer turn that input into a word