Please enable JavaScript.

Coggle requires JavaScript to display documents.

DL(17) Vanishing/Exploding of Gradients remedies - Coggle Diagram

- - - - 3 gate units with sigmoid
        
        output gate ON: let the current value stored in the memory cell to be read in input
        
        forget gate OFF: let the current value stored in the memory cell to be reset to 0, it's crucial for LSTM performance
        
        input gate: ON: let input to flow in the memory cell
      - peepholes connection
        
        allows to directly control all gates to allow for easier learning of precise time
      - linear memory cell: integrate input information through time
        
        memory obtained by self-loop
        
        gradient not down-sized by Jacobian of sigmoidal function → no vanishing gradient
      - Full BPTT
  - - - standard recurrent neurons
      - leaky integratos unit: h(t) = (1 - a)h(t-1) + 𝜎(Ux(t) + Wh(t-1), where a is the leaky decay rate (a < 1)
    - - spiking integrate-and-fire neurons
      - neurons become off after activation for some second
      - are methods that are trying to reproduce the behaviour of a real brain neuron
- - - - be big
      - sparsely (W up to 20% possible connection) and randomly connected
      - satisfy the echo state property : p(W) < 1
  - - - it drives the neurons' output activities to approximate exponential distributions
      - the exponential distribution maximized the entropy of a non-negative random variabile with a fixed mean, this enabling the neurons to transmit maximal information
- - - - task: reconstruct the input with increasing delay
        
        memory capacity: ∑ r² ( x(t-k), oᵏ(t) )
        
        where r² ( x(t-k), oᵏ(t) ) is the squared correlation coefficient beeween:
        
        input: x(t-k) with delay k
        
        corresponding output: oᵏ(t) generated by the net at time t for delay k
        
        target: yᵏ(t) = x(t-k) ∀k ∈ [0, ... , ∞]