Please enable JavaScript.

Coggle requires JavaScript to display documents.

Chapter 6 (Similarity (Data Mining: Often group by similarity (Search for…

- - - - Can be used for classification & regression
        
        Group into clusters
        
        Provide recommendations (ex. people who like "x" also like "y")
        
        Similarity + distance to create predictive models
        
        data near each other is treated similarly
- - - - Odd # for majority classification
        
        Referred to as k-NN
        
        Heterogeneous attributes pose problem
        
        Fix problem w/ Feature Selection or Tune similarity/distance function manually
        
        Computes fast
        
        Must be coded numerically
        
        1 more item...
- - - - Dendrogram: shows hierarchy of clusters
        
        Clusters are merged until one cluster remains
        
        linkage function= Euclidean dist b/w closest points in each cluster
        
        Clustering Around Centroids
        
        Focus around cluster center "centroid"
        
        take mean of the cluster and move the centroid
        
        1 more item...
- - - - run algorithm to create classifier
        
        differential descriptions for each cluster
- - - - Separate into supervised/ unsupervised
        
        Check if results solve problems
        
        Creativity must be applied