Please enable JavaScript.

Coggle requires JavaScript to display documents.

Similarity/Neighbors CH 6 (Calculating scores from neighbor (Majority vote…

- - - - instances near each other are treated similarly
        for some purpose
      - we need a basic method for measuring similarity or distance
      - we can compute the overall distance by computing the distances of the individual dimensions
      - This is called the Euclidean distance
      - useful for comparing
        the similarity of one pair of instances to that of another pair.
- - - - given a new example whose target variable we want to predict, we scan through all the training examples and choose several that are the most similar
      - Then we predict the new example’s target value, based on the nearest neighbors’ (known) target values
- - - - The general principle at work is that care must be taken that the similarity/distance computation is meaningful for the application
- - - - we can conduct cross-validation or other nested
        holdout testing on the training set, for a variety of different values of k
        
        Then when we have chosen a
        value of k, we build a k-NN model from the entire training set
- - - - Whether such justifications are adequate depends on the application
    - - if model intelligibility and justification are critical, nearest-neighbor methods should be avoided