Please enable JavaScript.
Coggle requires JavaScript to display documents.
Unsupervised Algorithm - Coggle Diagram
Unsupervised Algorithm
Dimensionality Reduction
PCA (Principal Component Analysis) → Unsupervised
LDA (Linear Discriminant Analysis) → Supervised
- maximizes separation of known classes
-
-
Clustering
Hierarchical Clustering
- does not require specifying # of clusters
- slower than k-means but more flexible
Linkage criteria
-
-
-
-
Ward Linkage (variance minimization)
Approach
Divisive Hierarchical Cluster Analysis
Agglomerative Hierarchical Cluster Analysis
DBSCAN
- Density Based Spatial Clustering of Applications with Noise
- does not require specifying # of clusters
- identifies points that are alone in low-density regions as outliers (aka robust to outliers)
- can find arbitrarily shaped clusters, but fails to cluster when no noise points are present
- hyperparameters: min_samples(minimum points), eps(epsilon distance)
- key terms: core point, border point, noise point, density connected points
- IMP: DBSCAN is not a predictive model but a fitting model. will treat test set as a new dataset and find clusters independently
K-Means Clustering
- hyperparameters: n_clusters(k), n_init(# of times algo is run with different centroid seeds), init (initialization method: k-means++ or random), max_iter, tol (tolearance)
Find optimal K:
- Elbow method (calculate WCSS)
- Silhouette Score (inter/intra cluster distance; range: -1 to 1
-