Please enable JavaScript.
Coggle requires JavaScript to display documents.
DL(15) Architectural Feature of RNN - Coggle Diagram
DL(15) Architectural Feature of RNN
how to implement f(∙) and g(∙) ?
Linear
Hidden Markov Model (probabilistic model)
Linear Dynamical System (Linear Autoencoders for sequence)
Kalman Filter (f,g linear)
Nonlinear
Recurrent Neural Network
Shallow RNN
h
(t) = f (
Ux
(t) +
Wh
(t-1) +
b
)
o
(t) = g (
Vh
(t) +
c
)
f(∙)
and
g(∙)
are non-linear function, and
h
(0) = 0
Architectural Feature for RNN
higher-order states
if we want dependence from (t-2), (t-3), ...
h
(t) = f (
Ux
(t) +
W¹h
(t-1) +
W²h
(t-2) +
b
)
we need new set of parameters
W²
, this last is associated to
q^-2
feedback from output (context)
h
(t) = f (
Ux
(t) +
Wh
(t-1) +
Zo
(t-1))
Z
can help learning with
teacher forcing
: instead of input, we give as feedback the target
h
(t) depends also by the output
o
(t-1)
short-cut connections
if we want the output depending by the input
o
(t) = g (
Vʰh
(t) +
Vˣx
(t) +
c
)
we need new set of parameters
Vˣ
bidirectional RNN
o
(t) = g (
Vᴾhᵖ
(t) +
Vᶠhᶠ
(t))
hᵖ
(t) = f (
Uᴾx
(t) +
Wᴾhᴾ
(t-1))
when the sequence is positional and not temporal
hᶠ
(t) = f (
Uᶠx
(t) +
Wᶠhᶠ
(t+1))
How to Learn: Time Unfolding
The unfolded network has a feedforward structure: gradient can be computed
Difference beetween forward NN
in RNN the set of parameters are the same in each layer
in
NN
every time we move to another layer we change the parameters
BPTT vs RTRL
BPTT
Space: O(NT)
Time: O(N²T)
computes all output, then backpropagate errors
RTRL
Space: O(N³)
Time: O(N⁴)
we can change the weights as soon as we have error