Please enable JavaScript.
Coggle requires JavaScript to display documents.
Unsupervised Learning - Coggle Diagram
Unsupervised Learning
Clustering
Group similar data points into a group
Simplifies data by reducing many data points into a few clusters
Types
Connectivity Based Clustering
Calculate the distance of every pair of rows and finds what are the two closest pairs of points between each other
It is very computationally expensive
n * (n-1) / 2 times, n=number of rows
-
Centroid Based Clustering
We define a number of clusters K
We compute the distance of the rows to those K centers, and are assign to the closer group of the centroid
n * k times, n=number of rows, k=number of clusters
Lloyd's algorithm
- Starts defining K
- It assign k centers
- Calculate the distance of all the points to every center
- Assign each point to the closest center
- Ignore this initial centers and recalculate the centroids of each group
Defining Kgood balance between
elbow method
The elbow method is used to determine the optimal value of K to perform the K-Means Clustering Algorithm.
The basic idea behind this method is that it plots the various values of cost with changing k.
Pros & Cons
Pros
- Very simple
- Computer efficient
- Once identified centroids, it is easy to assign new objects to a cluster
- As is unsupervised, eliiminates subjetivity
Cons
- How do you choose K
- Sensitive to initial starting points
- Sensitive to curse of dimensionality
-
-
-