Learning dynamics
Can this project connect with learning dynamics ?
Rich / lazy regimes, saddle to saddle, AGF, Large learning rates don't preserve the shape, regularization drives back towards the cone [kunin2025, du2018anips] and maybe [kunin2024, wang2022]
Exp constrained optimization on \(\mathcal H(0)\) i.e. gradient descent on the cone
Alternate between gradient updates and projection onto the cone to approximate the learning dynamics on the cone.
Permutation should not be taken care of during training because they are transparent (permutation equivariant) w.r.t. derivatives.
Log everything during training, in particular the rescaling used \(\alpha_1, \dots ,\alpha_T\) because they are indicative of the shape distortion and therefore probably of training success (many distortions = stronger gradients for longer = noisier/more complex dataset)
Also log the \(\theta_{\text{input}}\) (which is equal to \(\theta_{\text{output}}\))