Please enable JavaScript.
Coggle requires JavaScript to display documents.
Cluster ( Unsupervised learning) - Coggle Diagram
Cluster ( Unsupervised learning)
常用在:資料成長相當快、不能標註Label;或是對資料不清楚、無法標label
As a stand-alone tool to get insight into data distribution
As a preprocessing step for other algorithms
Good Clustering
high intra-class similarity
low inter-class similarity
Measure
Dissimilarity/Similarity metric: Similarity is expressed in terms of a distance function, typically metric: d(i, j)
Requirements
Scalability
Ability to deal with different types of attributes
Ability to handle dynamic data
Discovery of clusters with arbitrary shape (cluster可以是任意形狀)
Minimal requirements for domain knowledge to
determine input parameters
Able to deal with noises and outliers
Insensitive to order of input records
High dimensionality (可以處理高維度資料)
Incorporation of user-specified constraints (clustering 的方式可以增加使用者條件)
Interpretability and usability
Typical Alternatives to Calculate the Distance
Single link: 找兩個cluster中最近的兩點的距離
Complete link:找兩個cluster中最遠的兩點的距離
Average link:計算每一個點與另一群的每個點的距離,取平均
Centroid:算出 cluster的中心點,拿中心點算距離
Medoid:找最靠近中心點的資料來算距離
Partitioning Approach