Please enable JavaScript.
Coggle requires JavaScript to display documents.
Mixture models and the EM algorithm (The EM algorithm (EM variants…
Mixture models and the EM algorithm
Latent variable models
Why?
Assume that the observed variables are correlated because they arise from a hidden "cause". LVMs allow you to model such processes.
Advantages
Parameter reduction
Hidden variables can serves as a
bottleneck
, which computes a compressed version of the data.
Disadvantages
Mixture models
Mixture of Gaussians
Mixture of mltinoullis
Clustering
Mixture of experts
Application to inverse problems
Parameter estimation for mixture models
Unidentifiability
Computing MAP is non-convex
The EM algorithm
Intuition
MLE/MAP is easy when we observe values of every random variable (i.e. we have complete data).
MLE/MAP is difficult if we have missing data and/or latent variables.
One approach is to minimise the negative log likelihood
\( NLL(\theta) = -log(D | \theta) \).
But often we have constraints
, such as covariance matrices being positive definite, mixing weights sum to one, which
can be tricky
.
It can be done though
: using Cholesky decomposition \(\Sigma = LL^T \) (optimise \(L\)) or use logits for \( \pi \)
EM is an iterative algorithm
, often with
closed form updates
at each step. EM also
automatically enforces the required constraints
!
EM exploits
the fact that if the
data were fully observed
, then the
MLE/MAP would be easy to compute
.
EM alternates
between
inferring missing values
given the parameters (E step) and then
optimising the parameters
given the "filled in" data (M step).
EM for GMMs
EM for mixture of experts
EM for DGMs with hidden variables
EM for the Student distribution
EM for profit regression
Theoretical basis for EM
Special case of larger class of algorithms called
bound optimisation
or
MM algorithms
(minorise-maximise)
Online EM
EM variants
Annealed EM
Variational EM
Monte Carlo EM
Generalised EM
Expectation conditional maximisation
Over-relaxed EM
Basic idea
Want to minimise log likelihood
Model selection for latent variable methods
Fitting models with missing data