Please enable JavaScript.
Coggle requires JavaScript to display documents.
Ch 6 similarity, neighbors, & clusters (nearest neighbor reasonings (K…
Ch 6 similarity, neighbors, & clusters
nearest neighbor reasonings
find most similar neighbors
define data to understand similarity
can be used for predictive modeling
use neighbors to predict target value
probability estimation
regression
K NN
K = # of NNs
odd K preferred
K = N
entire data set is being used
determining K:
weighted voting
similarity moderated voting
weighted scoring
reduces importance of deciding which Ns to use
similarity and distance
Eucidean Distance
find hypotenuse
can be > 2D
geometry
geometric interpretations, over fitting, & complexity
k-NN classifier
K is the complexity parameter
K=N ->little complexity
k = 1 -> really complex
Issues w/ NN methods
intelligibility
justification of a specific decision
intelligibility of entire model
dimensionality & domain knowledge
computational efficiency
Heterogeneous attributes
diff scales
i.e. age 18-100 s salary 1K to 10000K
diff types of data
i.e. sex (M/F) vs salary
Other distance funcs
Manhattan distance
Jaccard distance
cosine distance
edit distance
Combining functions
majority scoring function
similarity moderated...
classfication
scoring
regression
Clusters
finding natural groupings in data
hierarchical cluster
highest level= cluster w/ everything
lowest level= remove all clusters
around centroid
possible results
dendrogram
set of cluster centers & their corresponding data