Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter 6 (Clustering (Results (Dendrograms, Set of cluster centers),…
Chapter 6
Clustering
groups of objects are similar, objects in other groups are not similar
Hierarchical Clustering
groups points by similarity
Clustering around Centroids
represent each cluster by centers
Results
Dendrograms
Set of cluster centers
Nearest-Neighbor Reasoning
not mutually exclusive
Prediction Modeling
target examples & find most similar to new example
Classification
find nearest neighbors and targets
Probability Estimation
assigning a score
Regression
generate prediction
Issues
Intelligibility
Justification of Decision
Intelligibility of Entire Model
Dimensionality
too many attributes
Computational Efficiency
Expensive/Impractical
Summary
Similarity ranging
Exploratory Data
Spend less time at outset
More time on evaluation stage
Similarity of Two Proxies
distance between them in the instance space
Nearest-Neighbor Methods
calculate explicitly the similarity between a new example and new examples
Heterogeneous Attributes
more complicated
Problem Versus Data Exploration
Should work to define as precisely as possible
Explore data with only vague notions
Other Distance Functions
Euclidean Distance
Fast
12 Norms
Manhattan distance
Jaccard Distance
Cosine distance
Combining Functions: Calculating Scores from Neighbors
Majority vote
Majority scoring function
Similarity-moderated classification
Similarity-Scoring
Similarity-Regression
Clustering
notions of similarity and distance
Find groups of objects
Ex. Whiskey Analytics
Hierarchical
Allows data analyist to see groupings
Around Centroids
between individual instances and similarities
News Stories
Ratings changes and price adjustments
Understanding Results
Correspodnign data points