Please enable JavaScript.
Coggle requires JavaScript to display documents.
Provost Ch. 6: Clustering (Distance (nearest neighbors: the most similar…
Provost Ch. 6: Clustering
Similarity
data mining based on grouping similar things
retrieve
similar things
group similar items in
clusters
do
classification
and
regression
provide
recommendations
Distance
space between data instances
can be measured with euclidian model
nearest neighbors: the most similar instances
predictive modeling
based on similar data instances from the past
probability estimation
how many neighbors? (k)
weighted voting of neighbors
problems
intelligibility
justification of decision
intelligibility of model
nearest neighbor should be avoided if above two concepts are critical
dimensionality and domain knowledge
similarities could be irrelevant (high dimensional)
computational efficiency
hetergeneous attributes
cannot perform euclidian model of prediction
categorical variables
clustering
hierarchical clustering
collection of ways to group points
see landscape of data similarity
ex: tree of life
clustering around centroids
k means algorithm