Please enable JavaScript.
Coggle requires JavaScript to display documents.
Bishop (Mixture Models
-assumption joint distribution of \( p(X,Z) \)…
Bishop
Mixture Models
-assumption joint distribution of \( p(X,Z) \) easy , marignal distibution \( p(X) \) complex
- Problemset
Simple: K-Means
- Intuitivel a Cluster group of points wich minimizes ingroup distances
- -> minimize
Basic Algorithm
- Initialize \( \mu_k \)
- Step 1 (Exp) minimize J with respect to \( r_{nk} \) ( assign points to closest cluster mean )
- Step 2 (Max) minimize J with respect to \( \mu_k \) (update \( \mu_k \) to mean of points in cluster )
- repeat Step 1-2
from left to right: init, Step1, Step2
Online Algorithm
- Robbins Monro procedure
-
- why? whats the purrpose. stop early?
Generalization:
in the M step restrict \( \mu_k to \) be one of the datapoints search over alll combinations \( \mathcal O(N²) \)
Mixture of Gaussian
- -> redued complicated disrtibution to sum of easy distributions
- look at posterior \( p(z_k| x) \)
Graphical Model
PLot of responsibilities using true parameters
Naive approach : Max Likilihood
- Difficult because sum inside logarithm :red_cross:
- gradient with respect to \( \mu_K \)
-
gradient with respect to \( \Sigma \)
- gradient with respect to \( \pi \) using lagrange multipliers
Problem responsibilities depend on parameters ! :red_cross:
Possible Problem of singularities of likelihood for multiple classes (extreme overfitting) :red_cross:
EM for Gaussian Mixture :check:
- choose initial values for parameters ( mean ...)
- eveluate responsibilities for all datapoints
- use responsibilities to update parameters
- commonly use K-means to initilaize means
Visualization
Gaussian Mixtures revisited
-
- log likelihood
- sum of K independent distributions -> closed form solutions for each Class
Grahpical Model for complete Data
If we do not have complete Data look at posterior
- -> the posteriror facotirizes of z_i ( d-speration )
-
- which now shoes that the result we motivated/"derived" for the gaussian mixture follow from our alternative more abstract view
Equavalence to K means for variance going to zero
General EM
consider
- this can be rewritten
-> E Step minimize KL by ->LELBO now tight bound
-> M step maximise
VAE
\( \mathcal L(q) = \int q(z) ln\frac{p(X,Z)}{q(Z)} =
\int q(z) ln(p(X| Z) + \int q(z) ln\frac{p(Z)}{q(Z)} \)
- the likelihood of \( p(X | \theta ) \) = \(\sum_n p(_n | \theta ) \) has a lower bound \( \sum_n \mathcal L_n \) given by a sum of ELBO's above
- this lower bound needs to maximised with respect to the parameters
- aproximate q and p with neural networks restrict them for example to gaussian family and demand that the q ( of course also p network) are the same for every data point
- the elbo is an expected value of two terms with respect to q(z) we aproximate this by sampling from q(z)
-
Alternative view of EM
- abstract goal is to find maximum likelihood of a latent variable model with parmaters \( \theta \)
- assume we have complete dataset {X,Z} maximising \( ln(p(X,Z|\theta ) \) is usually simple -> see gaussian mixture complete data
- do not have complete data consider expected value of complete data
:Mixtures of Bernulli
- introduze lone hot latent z
do the same steps again ..
Bayesian Linear Regresiion
- goal :solve the task arising in evidence framework : the maximisation of \( p(\ \alpha, \beta | \mathcal D ) \) assuming a flat prior! which means maximising
- consider w as a latent variable
- E step calculate posterior distribution of w given \( \alpha . \ \beta , \ \mathcal D \)
- the complete data log likilihood is
- take expectation of complete data log likelihood with respect to posterior of latent to get lower bound on \( p(\mathbf t ) \) and maximise this to get the parameters that maximise \( p(\ \alpha, \beta | \mathcal D ) \) assuming a flat prior!
- next step maximise parameters
-> same stationary point as evidence framwork simple calculation. but different re estmimation equation :red_cross:
- -> analytical integration of
and subsequent setting gradient zero is equivalent to EM
-
-
-
Aproximate Inference
Task
-evaluate posterior \(p(Z|x) \) or calculate expectations with respect to the posterior
-
-
-
-