Please enable JavaScript.
Coggle requires JavaScript to display documents.
Bishop (Mixture Models
-assumption joint distribution of \( p(X,Z) \)…
Bishop
Mixture Models
-assumption joint distribution of \( p(X,Z) \) easy , marignal distibution \( p(X) \) complex
- Problemset

Simple: K-Means
- Intuitivel a Cluster group of points wich minimizes ingroup distances
- -> minimize

Basic Algorithm
- Initialize \( \mu_k \)
- Step 1 (Exp) minimize J with respect to \( r_{nk} \) ( assign points to closest cluster mean )

- Step 2 (Max) minimize J with respect to \( \mu_k \) (update \( \mu_k \) to mean of points in cluster )

- repeat Step 1-2

from left to right: init, Step1, Step2
Online Algorithm
- Robbins Monro procedure
-

- why? whats the purrpose. stop early?
Generalization:

in the M step restrict \( \mu_k to \) be one of the datapoints search over alll combinations \( \mathcal O(N²) \)
Mixture of Gaussian



- -> redued complicated disrtibution to sum of easy distributions
- look at posterior \( p(z_k| x) \)

Graphical Model 
PLot of responsibilities using true parameters

Naive approach : Max Likilihood

- Difficult because sum inside logarithm :red_cross:
- gradient with respect to \( \mu_K \)

-

gradient with respect to \( \Sigma \)

- gradient with respect to \( \pi \) using lagrange multipliers


Problem responsibilities depend on parameters ! :red_cross:
Possible Problem of singularities of likelihood for multiple classes (extreme overfitting) :red_cross:

EM for Gaussian Mixture :check:
- choose initial values for parameters ( mean ...)
- eveluate responsibilities for all datapoints
- use responsibilities to update parameters
- commonly use K-means to initilaize means
Visualization 
Gaussian Mixtures revisited
-
- log likelihood

- sum of K independent distributions -> closed form solutions for each Class
Grahpical Model for complete Data

If we do not have complete Data look at posterior

- -> the posteriror facotirizes of z_i ( d-speration )
-

- which now shoes that the result we motivated/"derived" for the gaussian mixture follow from our alternative more abstract view
Equavalence to K means for variance going to zero

General EM
consider 
- this can be rewritten


-> E Step minimize KL by
->LELBO now tight bound
-> M step maximise
VAE
\( \mathcal L(q) = \int q(z) ln\frac{p(X,Z)}{q(Z)} =
\int q(z) ln(p(X| Z) + \int q(z) ln\frac{p(Z)}{q(Z)} \)
- the likelihood of \( p(X | \theta ) \) = \(\sum_n p(_n | \theta ) \) has a lower bound \( \sum_n \mathcal L_n \) given by a sum of ELBO's above
- this lower bound needs to maximised with respect to the parameters
- aproximate q and p with neural networks restrict them for example to gaussian family and demand that the q ( of course also p network) are the same for every data point
- the elbo is an expected value of two terms with respect to q(z) we aproximate this by sampling from q(z)
-
Alternative view of EM
- abstract goal is to find maximum likelihood of a latent variable model with parmaters \( \theta \)

- assume we have complete dataset {X,Z} maximising \( ln(p(X,Z|\theta ) \) is usually simple -> see gaussian mixture complete data
- do not have complete data consider expected value of complete data


:Mixtures of Bernulli


- introduze lone hot latent z

do the same steps again ..
Bayesian Linear Regresiion
- goal :solve the task arising in evidence framework : the maximisation of \( p(\ \alpha, \beta | \mathcal D ) \) assuming a flat prior! which means maximising
- consider w as a latent variable
- E step calculate posterior distribution of w given \( \alpha . \ \beta , \ \mathcal D \)
- the complete data log likilihood is
- take expectation of complete data log likelihood with respect to posterior of latent to get lower bound on \( p(\mathbf t ) \) and maximise this to get the parameters that maximise \( p(\ \alpha, \beta | \mathcal D ) \) assuming a flat prior!

- next step maximise parameters
-> same stationary point as evidence framwork simple calculation. but different re estmimation equation :red_cross:
- -> analytical integration of
and subsequent setting gradient zero is equivalent to EM
-
-
-
Aproximate Inference
Task
-evaluate posterior \(p(Z|x) \) or calculate expectations with respect to the posterior
-
-
-
-