Please enable JavaScript.
Coggle requires JavaScript to display documents.
Classification - Coggle Diagram
Classification
Predictive analytics (supervised learning)
well defined outcome
Binary - K-NN, decision trees
Numerical - K-NN, Regression tree, linear regression
specify one dependent variable (outcome)
specify independent variables (attributes)
KNN Algorithm
nearest neighbors
For a given observation, identify nearest observations
USE distance metrics (euclidean)
Decide K (# neighbors)
determine normalization (AFTER split)
Calculate distance
Pick (K) nearest neighbors
how to choose k
Overfitting - sensitive to outliers
Underfitting - may miss structure of local data
Choose lowest error rate
Pro : easy + no statistics
Con : "Lazy" + not good for large data
Decision Tree Algorithm
Combines numeric +binary data
Start at Top (ROOT node)
Work down (internal or decision node)
Bottom (Leaf node) contains class label
Purity
Impure if #>0 on both branches
Entropy (measure purity)
-(M pk * log2(pk))
weighted entropy
weighted avg of leaf
Info Gain: entropy (parent) - weighted (child)
Stopping Criteria: when all points in a node are the same class
Classification Step 1
Model Construction (training)
1) Randomly split the data into 3 : Training, Validation, Test
2) Choose a classification model algorithm + train
Classification Step 2
Model Validation
3) Refine trained model on validation data
Classification Step 3
Model Testing
4) Test accuracy on test data
5) Good model? classify new data
Probabilistic Algorithm
KNN + Decision Tree also produce probability
Unbalanced Data
Confusion Matrix ( + - )
error rate : (FN + FP) / (TP + TN + FP + FN )
accuracy : (TP + TN) / (TP + TN + FP + FN )
Precision: Horizontal