Please enable JavaScript.

Coggle requires JavaScript to display documents.

Provost CH6: Similarity, Neighbors, and Clusters (Combining Functions:…

- - - - Manhattan distance (L1 norm)
        
        dManhattan(X,Y)= ∥X-Y∥1 = |x1 -y1 | + |x2 -y2 | + ⋯
        
        the sum of the (unsquared) pairwise distances
      - Jaccard distance
        
        d (X,Y)=1- |X∩Y|/|X∪Y|
        
        treats the two objects as sets of characteristics
      - Euclidean distance (L2 norm)
        
        dEuclidean(X,Y)= ∥X-Y∥2 = sqrt (x1 -y1)2 +(x2 -y2)2 + ⋯
        
        reduces a comparison of two (potentially complex) examples into a single number
      - Cosine distance
        
        d-cosine (X, Y) = 1 - ((X · Y)/∥X∥2· ∥Y∥2
        
        used in text classification to measure the similarity of two documents
- - - - distance between A and B; length of the hypotenuse
      - sqrt. (xA - xB)2 + (yA - yB)2
    - - sqrt. (d1,A - d1,B)2 + (d2,A - d2,B)2 + ... + (dn,A - dn,B)2
- - - - whether justifications are adequate depends on the application
- - - - a clustering because it groups the points by their similarity
      - consider “clipping” the dendrogram with a horizontal line, ignoring everything above the line
    - - o represent each cluster by its “cluster center,” or centroid