Please enable JavaScript.
Coggle requires JavaScript to display documents.
PROVOST CHAPTER 6: SIMILARITY, NEIGHBORS & CLUSTORS (REASONING FOR…
PROVOST CHAPTER 6: SIMILARITY, NEIGHBORS & CLUSTORS
-
-
-
-
Predictive Modeling
"given a new example whose target variable we want to predict, we scan through all the training examples and choose several that are the most similar to the new example"
-
-
Regression: use 3 (or so) nearest neighbors - find their info (income) - use these to generate prediction of (income) through average or median
Nearest Neighbor
-
preform prediction tasks by calculating similarity between a new example and a set of training examples with known values for the target
we can then use these for mining tasks: classification, regression, instance scoring
-
Manhattan Distance
dManhattan(X,Y)=∥X-Y∥1=|x1-y1|+|x2-y2|+⋯
-
Jaccard Distance
dJaccard(X,Y)=1-|X∩Y||X∪Y|
-
Cosine Distance
dcosine(X,Y)=1-X·Y∥X∥2·∥Y∥2
-
-
Clustering
-
-
Hierarchical clustering: "a clustering because it groups the points by their similarity. Notice that the only overlap between clusters is when one cluster contains other clusters. Because of this structure, the circles actually represent a hierarchy of cluster"
-
-