Please enable JavaScript.
Coggle requires JavaScript to display documents.
Deep Learning Tricks (Activation Functions (ReLU (Vanishing Gradient…
Deep Learning Tricks
-
Optimizers
SGD
-
-
Momentum
Running Average of the Gradients, instead of the current batch of the data
-
Dont Work?
-
ADAGRAD - deals with learning rate, momentum and initial weights for you