Factor analysis vs Cluster analysis (Factor analysis (Types of factoring…
Factor analysis vs Cluster analysis
Identification of the underlying factors
: includes clustering variables into homogenous sets, creating new variables and helping to gain knowledge about the categories
Screening of variables
: It is helpful in regression and identifies groupings to allow you to select one variable that represents many.
Data collected are interval scaled
Multicollinearity in the data is desirable as the objective is to find out the interrelated set of variables
Models are usually based on linear relationships
Data should be open and responsive for factor analysis
Scree plot criteria: a plot of the eigenvalues against the number of factors, in order of extraction; the shape of the plot determines the number of factors
Percentage of variance criteria
Eigenvalue criteria: sum of the square of the factor loadings of each variable on a factor represents the eigenvalue; factors with eigenvalues greater than 1 are kept
Significance test criteria: statistical importance of the separate eigenvalues is found out, and only those factors that are statistically significant are retained
: an exploratory analysis that helps in grouping similar variables (features) into dimensions. Can be used to simplify the data by reducing the dimensions of the observations and is used mostly for data reduction purpose.
Types of factoring
Canonical factor analysis
Common factor analysis
Principle component factoring
Factor regression model
To address the heterogeneity in each set of data
Taxonomy description: identifying groups within the data
Hypothesis generation or testing
sample is a representative of the population
variables are not correlated. even if variables are correlated, remove correlated variables or use distance measures that compensates for the correlation
Types of clusters
Partitional clustering: K-means, fuzzy K-means, isodata
Density based clustering: denclust, CLUPOT, mean shipt, SVC, Parzen-Watershed
Hierarchical clustering: agglomerative and divisive method
Decide on how to group the objects
Decide the number of clusters
Decide the appropriate similarity measure
Interpret, describe and validate the clusters
Define the problem
Tools to do cluster analysis
It will produce clusters regardless of the actual existence of any structure
It can't be used widely as it totally depends on the variables used as a basis for the similarity measure
It's descriptive, theoretical and non-inferential
Principal Components Analysis (PCA) starts directly from a character table to obtain non-hierarchic groupings in a multi-dimensional space. Any combination of components can be displayed in two or three dimensions. Discriminant analysis is very similar to PCA. The major difference is that PCA calculates the best discriminating components without foreknowledge about groups, whereas discriminant analysis calculates the best discriminating components (= discriminants) for groups that are defined by the user.
Principal component analysis involves extracting linear composites of observed variables.
Factor analysis is based on a formal model predicting observed variables from theoretical latent factors.
In terms of a simple rule of thumb, I'd suggest that you:
Run factor analysis if you assume or wish to test a theoretical model of latent factors causing observed variables.
Run principal component analysis if you want to simply reduce your correlated observed variables to a smaller set of important independent composite variables.