Paper Storylines
"Metrics are not good atm"
Case 2: Pruning as denoising
shows that the
that
"Effects of compression on models
and their representations"
"Holistic view of compressing models"
- Looking at more than one
Case 1: Symmetry --> bad
SvCCA
PwCCA
LinCKA
click to edit
The relevance of relevance:
Evaluating the effects of noise and importance of representations.
Experiments
Toy Experiment showing
different cases.
Hypothesis 1:
Important representations channels
with some uncorrelated noise channels
Hypothesis 2:
Important representations channels
some uncorrelated representation channels
some correlated noise channels
Hypothesis 3:
Varying degrees of amplitude for the noise/signal channels.
Pruning Experiments
Exp1:
Similarity between pruned models
vs
Similarity between unpruned models
Finding:
Hard to identify:
Results are somewhat mixed.
Exp 2:
Pruning keeps Identity
Experiment
Hypothesis:
Pruning removes channels that are not
relevant for the (classification) task.
This could be uncorrelated or correlated noise
Hypothesis:
Pruning maintains decision of original model so representation similarity of e.g. Model 0 unpruned to Model 0 pruned should be greater than of
Model 1 pruned to Model 0 unpruned
Finding:
True for SVCCA and LinCKA
Structured Pruning
Unstructured Pruning
Linear CKA
SVCCA
Issues
While the similarity
might stay identical we still observe
that the models are within linear stitching range.
For LinCKA it is super annoying since the similarity at layer 10-12 is 0.2 or less. But stitching says no accuracy loss at all ❌
Trained Transfer
Exp. 3:
Pruning Progression
at different Timesteps
Exp 4:
Distillation
ToDo:
Add "Similarity" of OLS
"Toy" Experiment
Remove some Layers of VGG/ResNet
CCA Similarities and Stitching (Symmetry)
Other metrics as well.
Check that same model vs same pruned
experiment result generation.
Reduction to 5 folds
Architekturen:
ResNet18
ResNet101
Vgg11
Vgg19
click to edit
ImageNet Experiments.