Chapter 6 (Similarity (Data Mining: Often group by similarity (Search for…
Data Mining: Often group by similarity
Search for "right" similarity
Can be used for classification & regression
Group into clusters
Provide recommendations (ex. people who like "x" also like "y")
Similarity + distance to create predictive models
data near each other is treated similarly
Create classification boundaries
Compute distance b/w data "Euclidean distance"
Nearest Neighbor Reasoning
Predict new target value based on newest neighbors' known target values
How many neighbors?
Odd # for majority classification
Referred to as k-NN
Heterogeneous attributes pose problem
Fix problem w/ Feature Selection or Tune similarity/distance function manually
Must be coded numerically
1 more item...
Group similar objects together and different groups far apart
Hierarchical: Group by broadening characteristics
Dendrogram: shows hierarchy of clusters
Clusters are merged until one cluster remains
linkage function= Euclidean dist b/w closest points in each cluster
Clustering Around Centroids
Focus around cluster center "centroid"
take mean of the cluster and move the centroid
1 more item...
Mix clustering w/ supervised learning
run algorithm to create classifier
differential descriptions for each cluster
Solving Business Problems
Define target/ understand problem that needs answers
Perform unsupervised segmentation
Separate into supervised/ unsupervised
Check if results solve problems
Creativity must be applied