Please enable JavaScript.
Coggle requires JavaScript to display documents.
Provost, Chapter 6: Clustering (nearest neighbors revisited: clustering…
Provost, Chapter 6: Clustering
the idea of finding natural unsupervised groups
objects within are similar not objects across
example: whiskey analytics
exploration of different tastes of whiskey to cater to different customers
hierarchal clustering
group points by similarity
only overlap is when one cluster contains another
aka hierarchy
highest level contains all clusters
lowest level means each point of data is a cluster
dendogram
x-axis is data point
y-axis is cluster distance
shows different way to cluster
shows landscape of data similarity
starts as node then merges
example: tree of life
hierachal phylogenetic chart of all life
start w/ most similar before joining others
most similar
nearest neighbors revisited: clustering around centroids
represent by cluster center
centroid, avergae
how many clusters do you want
then whichever are closest to centroid
keep recalculating until no shift
also can look @ distortion
aka sum of squared differences
efficient for time!
when k value creates the best result
minimum k where stabalization occurs
example: business stories
TRC2- those with AAPl
data preperation
words too rare or too frequent were eliminated
TFIDF score
gives score for each vocab word
news story clusters
9 clusters based on various thigns
Correlation is not causation, semantic is no syntactic
understanding the results
names might not be meaningful !
look at others
supervised learning
use to find what differentiates
generate classifier
tree, shows distinguishes
stepping back
vague solution?
might not be cut and dry
spend more time here if problem is vague