Please enable JavaScript.
Coggle requires JavaScript to display documents.
Clustering (K-Means
Clustering (Each cluster is represented by
center of…
Clustering
-
Need
-
-
Proximity Measure
- Dissimilarity Measure (Distance)
Method
Good clustering
- Minimum within-cluster variation
Within-cluster variation, W(C)
- Measure amount of sample within
each class different from one another
\(W(C_k) = \frac{1}{C_k}\sum_{i,i' \epsilon C_k}\sum_{j=1}^{p}(x_{ij}-x_{i'j})^2 \)
where
\(|C_k| \) = number of samples in cluster k
K-Means
Clustering
-
Goal
- partition xđť‘– such that distance of each sample
closest \( u_k \) is minimum
- \( u_k \) is the mean
Algorithm
- Initialize vector \( u_k \)
- For each sample, calculate its distance to \( u_k \) & assign it to nearest cluster
- Updated \( u_k \) by calculate the mean of samples belong to the cluster
-
Hierarchical
Clustering
Approach that do not require specify number of K,
Provide deterministic results
Agglomerative / Bottom-up
- Start with each point as its own clustering
- Identify two closest cluster & merge them
- Repeat until all points are in a single cluster
Linkage
- Distance between two clusters
- Compute all distance between data points
in cluster A & B
Average
- Take the smallest average of the dissimilarities
- Mean inter-cluster dissimilarity
- \(D_{G,H)} = \frac{1}{|G||H|}\sum_{i \epsilon G, j \epsilon H} d(x_i,j_i) \)
Complete
- Take largest distance
- Maximal inter-cluster dissimilarity
Single
- Take the smallest distance
- Minimal inter-cluster dissimilarity