Reduce Dimensionality Method

PCA(projection based)

critieria of PC

properties of PC/term

the direction capture the most variances of the data
(which PC contributes most?)

Data are centered, each PCs are unrelated

If X ~Muti Normal, PCs are iid

PC loadings= coeff of eigenvector, their sum of squares in comp=egienvalue. It can be used to explain the relationship between variables and PCs

Preprocessing data

Standardize data(correlation);apply to the case with different units

Not Stand(cov matrix; opp case

PC scores=SUM of (coeff*standarized variables). It can also be computed from SVD

Assumption

importance&covariance

The larger variances, the larger dynamics(rather than noise)

linear combination

Biplot

Can explain the correlation between Variables via direction

Can circle the observation correspond to one of variable(find outlier)

FA(model-based)

Motivation

Assumption

Cov( F, U)=0; Cov(U)=Ψ(diagnol)

E(F)=E(U)=0

F,U~Multi Gassian

FA vs PCA

Can observed variables be explained by a linear combination of "common factors"(latent)

How to select # of factors?

the max # is int: d=(p+1)/2-p(K+1)-k(k-1)/2>0;

p-value should be larger than 0.05, or the result are not robust/trustworthy

FA rotation&score

Rotation: varimax(orthogonal)/promax(oblique)

Score:Battle/Tomposon method

MDS

Rule of Thumb:Run FA if you wish to test a theoretical model of latent factors causing observed variables. while run PCA, If you want to simply reduce your correlated observed variables

PCA is to explain the variance while FA explains the covariance between the variables,Besides, FA is restricted by D of freedom

Dissimlarity(Distance) Matrix

Mahattan

Max

Eculudean

Gower

Similarity measure

Cos measure

Correlation coeff(ρ)

Converting similarity to Dissimilarity(d=1-ρ
we can visualize the correlation matrix by MDS and get clustering easily)

MDS vs PCA

follow the same linear projection(PCA also use E distance)

The ratio in MDS loss function is similar to the % variances unexplained in PCA

Converting distance to inner product: