Please enable JavaScript.
Coggle requires JavaScript to display documents.
Knowledge Distillation - Coggle Diagram
Knowledge Distillation
knowledge
logit
Distilling the Knowledge in a Neural Network
intermediate weights
FitNets: Hints for Thin Deep Nets
intermediate features
Like What You Like: Knowledge Distill via Neuron Selectivity Transfer
Paraphrasing Complex Network: Network Compression via Factor Transfer
intermediate gradiants / attentions
Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer
sparsity
Knowledge Transfer via Distillation of Activation Boundaries Formed by Hidden Neurons
relational information
A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning
distillation
self distilation
Born Again Neural Networks
online distillation
Online Deep Metric Learning via Mutual Distillation
combined
Knowledge Distillation by On-the-Fly Native Ensemble
Be Your Own Teacher
offline distillation
loss function
kullback-leibler divergence
MSE
cosine similarity
cross entropy
KD - ViT
one-to-all spatial matching
Knowledge Distillation via the Target-aware Transformer
Fine-Grain Manifold Distillation Method
Co-advise: Cross Inductive Bias Distillation
TinyViT: Fast Pretraining Distillation for Small Vision Transformers
Attention Probe: Vision Transformer Distillation in the Wild
Training data-efficient image transformers & distillation through attention
Unified Visual Transformer Compression
DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers
Cross-Architecture Knowledge Distillation