Please enable JavaScript.
Coggle requires JavaScript to display documents.
Classification Summary - Coggle Diagram
Classification Summary
Steps in classification:
Validation
Testing
Training
Classification algorithims
KNN
How to choose K: lowest error rate in testing data, # of correct predictions / # of observations in the testing data
Overfitting: small value, capture local structure, sensitive to outliers
Steps:
Choose specific K and distance measure (euclidean is common)
Normalize data if needed, if so do it after
For new observations, identify nearest existing K observation
Classify new observations as majority class among K-nearest, if tie randomly choose class
Decision Tree
Construct possible attribute threshold, pick one with largest information gain as root node
Look at preceding information gain for subsequent splits off attributes
Terminology:
Root node
Branches
Leafs
Purity/Entropy: the lower entropy the more pure the data
Partition has same value for outcome
Information gain: entropy-weighted entropy
Stopping point: When all points are from the same class, There are no remainng attributes to split
How accurate the model is?
Confusion matrix: True positive, true negative, false positive, false negative
Error rate
Accuracy
Recall: for each actual class how many were recovered
low false negative, high recall
Precision: for each predicted class, how many did the model get right
low false positive, high precision
If we care about precision and accuracy -> use F1 score