Please enable JavaScript.
Coggle requires JavaScript to display documents.
DL(21) - Restricted Boltzmann Machine - Coggle Diagram
DL(21) - Restricted Boltzmann Machine
Separation
no connection among
hᶨ
(latent variable)
no commections among
vᵢ
(visible variable)
P
(
v
) is itractable
P
(
h
|
v
) is factorial and easy to compute
P
(
h
|
v
) = up to
nᵥ
:
∏ 𝜎
[ (2
h
- 1)
⊙
(
c
+
W
ᵀ
v
) ]
ᶨ
P
(
v
|
h
) is factorial and easy to compute
P
(
v
|
h
) = up to
nₕ
:
∏ 𝜎
[ (2
v
- 1)
⊙
(
b
+
Wh
) ]
ᵢ
Energy-based Undirected Model
E (
v
,
h
) = −
b
ᵀ
v
−
c
ᵀ
h
−
v
ᵀ
Wh
Z
=
∑ᵥ∑ₕ
exp{ − E (
v
,
h
) }
P
(
v
,
h
) = 1/
Z
exp( − E (
v
,
h
) )
Why do we care about efficient computation of
P
(
h
|
v
) and
P
(
v
|
h
)
computing ∇ᶿ log p̃(
x
;
θ
) is
not a problem
however the term ∇ᶿ log Z(
θ
) is more
problematic
because of
learning
it is easier to
maximize
the
Log-Likelihood
, because we consider sample indipendent, and we have summations
given training data
x
we need to maximize p(
x
;
θ
)
however its gradient wrt
θ
is: ∇ᶿ log p(
x
;
θ
) = ∇ᶿ log p̃(
x
;
θ
) - ∇ᶿ log Z(
θ
)
under some condition it can be shown that ∇ᶿ log Z(
θ
) =
E
x~p
(
x
) ∇ᶿ log p̃(
x
)
which is not possible to compute exactly, however we can use
Monte Carlo method
to compute a good approximation of it
Monte Carlo Method
ŝⁿ
= 1/
n
∑
f(
xᶤ
),
xᶤ
~
p
E[
ŝⁿ
] =
s
s
=
∫
p(
x
)f(
x
)d
x
= Eₚ[ f(
x
)]
however, sampling from p(
x
) is not always possible
Importance Sampling
why is this useful?
a good
q
can reduce the variance
is still unbiased for every
q
maybe feasible to sample from
q
but not from
p
p(
x
) f(
x
) = q(
x
)
∙
(p(
x
) f(
x
)) / q(
x
)
q(
x
) is our new
p
, we will draw sample from
(p(
x
) f(
x
)) / q(
x
) this ratio is our new
f
, we will evaluate each sample
optimal
q
Var[
ŝ
𝐪 ] = Var[ ( p(
x
) f(
x
) ) / q(
x
) ] /
n
ŝ
𝐪 = 1/
n
∑
( p(
xᶤ
) f(
xᶤ
) ) / q(
xᶤ
)
E[
ŝ
𝐪] = E[
ŝₚ
] = s
minimum variance occus when
q
q٭(
x
) = ( p(
x
) f(
x
) ) /
Z