Vision Transformer
self attention
multiplicative - Effective Approaches to Attention-based Neural Machine Translation
scaled dot product - Attention Is All You Need
positional encoding
feature embedding
local window attention
Data efficient
distributed machine learning