Please enable JavaScript.
Coggle requires JavaScript to display documents.
Cyclic Learning rates (Cycle the learning rates between the lower bound…
Cyclic Learning rates
Cycle the learning rates between the lower bound learning rate and the upper bound learning rate during the complete run
Conventionally, the learning rate is decreased as the learning rate starts converging with time
-
Cycle is the number of iterations where we go from lower bound learning rate to higher bound learning rate and back to lower bound
-
-
One Cycle Policy
-
-
-
-
-
-
In the last remaining iterations, annihilate learning rate way below leaning rate value (1/10th or 1/100th)
Motivation: During the middle of training, the higher learning rates acts a regularization and keeps the network from overfitting. This helps the network to avoid steep areas of loss and land better flatter minima
One cycle Policy
-
Have an initial learning rate and a maximum learning rate and for each iteration use the learning rate obtained from the equation
-
-
-
-
Choose a learning an order lower than the learning rate where loss is minimum because this will be the learning rate where the loss is still decreasing
-
Cyclic momentum
-
One intuition is that in that part of training, we want the SGD to quickly go in new directions to find a better minima, so the new gradients need to be given more weight
Weight Decay
Use smaller values of weight decay like 1e-4, 1e-5. try dissecting the weight decay like 3e-4, 1e-4, 3e-5