Please enable JavaScript.
Coggle requires JavaScript to display documents.
Module 5 - Clustering pt.3 [Partitioning Algorithm], Major Clustering…
Module 5 - Clustering pt.3
[Partitioning Algorithm]
Clustering Approaches
K-means
K‐Means Clustering Methodology
K points are placed into the object data space representing the initial group of centroids.
Each object or data point is assigned into the closest k.
After all objects are assigned, the positions of the k centroids are recalculated.
Steps 2 and 3 are repeated until the positions of the centroids no longer move.
Variations of the K‐Means Method:
form (categorization of data) of the k-means which differ in
Selection of the initial k means
Dissimilarity calculations
Strategies to calculate cluster means
Handling categorical data:
k-modes
Replacing means of clusters with modes
Using new dissimilarity measures to deal with categorical objects
Using a frequency-based method to update modes of clusters
A mixture of categorical and numerical data: k-prototype method
Example of K‐Means Clustering
Problem of the K‐Means Method:
sensitive to outliers
Since an object with an extremely large value may substantially distort the distribution of the data
K-Medoids
Description:
Instead of taking
the mean value of the object
in a cluster
as a reference point
,
medoids can be used
, which is the
most centrally located object in a cluster
K‐Medoid Clustering Methodology
• (1) arbitrarily choose k objects in D as the initial representative objects or seeds;
• (2) repeat
• (3) assign each remaining object to the cluster with the nearest representative object;
• (4) randomly select a nonrepresentative object, orandom;
• (5) compute the total cost, S, of swapping representative object, oj, with orandom;
• (6) if S < 0 then swap oj with orandom to form the new set of k representative objects;
• (7) until no change;
Example of K‐Medoid Clustering
Major Clustering Approaches
Partitioning
Description:
The task is to categorize those items into groups.
The algorithm will categorize the items into k groups of similarity.
To calculate that similarity, we will use the euclidean distance as measurement.
classifies the information into multiple groups based on the characteristics and similarity of the data
Grid-based
Hierarchical
Density-based
gmbr methology geeks4geeks sini
gmbr tele sini
Gambar slide no18 sini