Please enable JavaScript.
Coggle requires JavaScript to display documents.
Provost: Ch 6: Similarity, Neighbors and Clustering - Up to Clustering…
Provost: Ch 6: Similarity, Neighbors and Clustering - Up to Clustering
Introduction
Data Mining group things by similarity
See this implicitly where boundaries are created to define groups
This section examines similarity directly
Similarity can be used in classification and regression
Clusters - unsupervised segmentation
Similarity and Distance
Similarity Requires Numeric Representation of properties of of objects
Euclidean Distance
Work for more than 2 dimensions
Manhattan
Dummy code categorical variables
Nearest Neighbor Reasoning
Classification
Probability Estimation
Regression
Use nearest neighbors in a sample to give probabilities on classification
Give Closer neighbor more wight
squared inverse distance to determine contribution weights
Scale different dimensions
determine right amount of neighbors (k)
cross validation
Issues
Curse of Dimensions
Too many dimensions masks important dimensions
Fast to train slow to predict
easy to interpret but no knowledge gained