Please enable JavaScript.

Coggle requires JavaScript to display documents.

Provost CH 6 (Similarity (Data Mining (Grouping things by similarity or…

- - - - pythagorean theorem
      - Essentially we can compute the overall distance by computing the distance of the individual dimensions or dividial features
      - Can do n'th dimensions = General Euclidean distance
- - - - Shows explicitly the hierarchy of the clusters
- - - - No simple answer to how many neighbors should be used (k-NN) depends on the problem
        
        weighted voting or similarity moderated voting
        
        the influence it has
        
        Can conduct cross-validation or other nested holdout testing on the training set for a variety of different values of 'k', searching for the one that preforms the best on the model
    - - target variables
      - classes
      - weight of the variable on the data
    - - Some fields of work nearest neighbor will work and in others it wont
      - the justification of a specific decision vs the intelligibility of an entire model
        
        aka will work for some things on a single case basis or more broadly in others
      - curse of dimensionality
        
        irrelevant variables
        
        asking what variables are relevant