Machine Learning Process & Metric

Machine Learning
Process & Metric

CRISP-DM

3. Data Preparation

Evaluation

2. Data Understanding

Modeling

1. Business Understanding

Deployment

Metrics

Training
Data

Accuracy/metric estimates not good indicator
of performance on future data

Measure the degree of overfitting/underfitting

Independent
Test Data

Used when we have plenty data

Natural way of forming training & test data

Hold-Out
Method

Splits the data into training
data & test data

Build a classifier using the train data
and test it using the test data

Stratificaton

Process of dividing members of population
into homogeneous subgroups before sampling

Strata should be mutually exclusive

Strata should be collectively exhaustive

k-Fold
Cross Validation

Avoids overlapping test sets data is
equally split into k subsets

Some subset for testing,
remainder for training

Leave-One-Out

Set number of folds to number of training instances

N-1 training instances, 1 test instance

Bootstrap

Uses sampling with replacement to form the training set

Type of resampling

Measurement

Confusion
Matrix

Precision = TP / (TP+FP)

Recall = TP / (TP+TP)

Error = (FP+FN) / (P+N)

Accuracy = (TP+TN) / (P+N)

FP Rate = FP/N

F1-Score

Harmonic mean of precision and recall