Please enable JavaScript.
Coggle requires JavaScript to display documents.
DL(20) - Structured Probabilistic Models - Coggle Diagram
DL(20) - Structured Probabilistic Models
Graphical Models
Directed
Degree od Dependence
best case
: P(
x₁
,
x₂
,
x₃
,
x₄
) = P(
x₁
) P(
x₂
) P(
x₃
) P(
x₄
)
worst case
: P(
x₁
,
x₂
,
x₃
,
x₄
) = P(
x₄
|
x₁
,
x₂
,
x₃
) P(
x₂
) P(
x₃
|
x₁
,
x₂
) P(
x₂
|
x₁
) P(
x₁
)
influence
clearly flows in one direction →
causal relation
Undirected
influence
is best modeled as flowing in
both direction
Joing probability factorization
define the
unnormalized probability distribution
P'
(
X
) =
∏
ɸ(
C
)
introduce partition function
Z
, to get
normalized probability distribution
P
(
X
) = 1/Z
∏
ɸ(
C
)
for each clique C in G, introduce a factor ɸ(
C
) (
clique potential
)
Markov Networks
Factor Graphs
: richer representation, every clique are substituted with a factor term f1, f2, f3, ...
Separation
A -- S -- B
when
S
is
not observed
, ifluence can flow from A to B and vice versa through S
when
S
is
observed
, it blocks the flow of influence between A and B: they are
separated
d-separation
in a directed models,
A
and
B
, if all path from
A to B
are blocked
Converting between graphs
from directed to undirected
: must add an edge between unconnected coparents
from undirected to directed
2) add edges to triangulate long loops
3) assign direction to edges, no directed cycles allowed
1) no loops of length greater than three allowed
Sampling
Directed Models
ancestral sampling
: pass through the graph in topological order, sample each node given its parents
harder to sample some nodes given other nodes, unless the observed nodes are at the start of the topology
easy and fast to draw fair samples from the whole model
Undirected Models
usually requires Markov chains (
Gibbs sampling
)
usually cannot be done exactly
usually require multiple iterations even to approximate
Learning about Dependencies
Learning Graph Structure
2) see which graph does best job of some criterion (training/validation set)
3) iterative search (remove edge, add edge, flip edge)
1) try out several graphs
Use Latent Variables
use many
latent variables
which cannot be observed, and add dense connection of latelt variables to observed variables
parameters learn that each latent variable interacts
strongly
with only a
small subset
of observed variables
use
one
graph structure
trainable with
gradient descent