Please enable JavaScript.

Coggle requires JavaScript to display documents.

Deep Learning Tricks (Activation Functions (ReLU (Vanishing Gradient…

- - - - Mean = 0
      - Equal variance (small)
    - - Mean = 0
      - Small Variance
      - Gaussian Distribution
    - - Running Average of the Gradients, instead of the current batch of the data
    - - Try to lower learning rate first
      - ADAGRAD - deals with learning rate, momentum and initial weights for you