Please enable JavaScript.
Coggle requires JavaScript to display documents.
Final Review, Clustering, Classification, Association Rule Analysis,…
-
-
Classification
-
-
What is classification?
identifying to which "class" a new observation belongs to, based on data containing observations whose class (outcome) variable is already known
Steps of Classification
- Training Step: construct a classification model based on training data
- Validation Step: refine your classification model on validation data
- Testing Step: measure the accuracy of your (final) model using test data
-
-
-
Clustering Algorithm
-
Hierarchical Clustering
- start from individual data points/smaller clusters
- form larger clusters in a hierarchical manner
0: compute distance between points based on a distance measure of choice and create a Distance Matrix
1: consider each data point individually, as its own 1 point cluster
2: merge the two 1 point clusters that are nearest to each other (based on the distance matrix) and forms a new cluster
3: merge the 2 clusters closer to each other
4: repeat (3) until there is only 1 cluster left or k clusters (where k is specified by user)
-
-
-
-
Regression
-
Regression Algorithms
Regression Trees
the outcomes in the leaf nodes are determined as the average of the outcome values of the data points in that node
splits are made using the Sum of squared deviations (SSD) from the average outcome value at that node
-
Regression (Numeric Prediction):
Predicts continuous/numeric values of the dependent variable
Ex: How much will the customer buy? (Quantity or $)
- Model Construction (Training step)
- Model Testing (Testing step)
-