Please enable JavaScript.

Coggle requires JavaScript to display documents.

Ch.6: Similarity, Neighbors, and Clusters (Similarity (Uses (Retrieval…

- - - - Euclidean Distance: distance using pythagorean theorem
  - - - how to describe scotch as a feature vector
        
        Color, nose, body, palate, finish
    - - Take similarity measure of a new variable for existing models
      - Use combining function to predict the new target variable's value
      - Classification
        
        known target variables (classes) are consulted
      - Probability Estimation
        
        Assign a score to a new variable
      - Regression
        
        Predictive mining task based on nearest neighbors
  - - - k is # of neighbors used
    - - Solved by using weighted voting
  - - - the intelligibility of an entire model
      - justification of a specific decision
    - - scaling of numeric attributes
      - similarity can be misled by the presence of too many irrelevant attributes
      - Fix using feature selection
        
        or tuning similarity function manually
    - - classification step is expensive
        
        needs to be completed very quickly
- - - - sums the differences along the different dimensions
    - - Treats the two objects as a set of characteristics
    - - often used in text documents to measure similarity of the two documents
- - - - The means are the centroids represented by the averages of the values in the cluster
      - k is simply the number of clusters that one would like to find in the data
      - Common concern is how to determine a good value for k