Please enable JavaScript.
Coggle requires JavaScript to display documents.
Machine Learning Process & Metric (Metrics (Stratificaton (Process of…
Machine Learning
Process & Metric
CRISP-DM
3. Data Preparation
Select data
Clean data
Construct data
Integrate data
Format data
Evaluation
Evaluate
results
Review
process
Determine
next steps
2. Data Understanding
Collect
data
Describe
data
Explore
data
Verify
data quality
Modeling
Select model
Technique
Design the test
Build model
Assess Mode
1. Business Understanding
Business
objective
Assess
situation
DM
goals
Project
plan
Deployment
Plan deployment
Plan monitoring
Plan maintenance
Final report
Review project
Metrics
Training
Data
Accuracy/metric estimates
not good indicator
of performance on future data
Measure the degree of
overfitting/underfitting
Independent
Test Data
Used when we have plenty data
Natural way of forming training & test data
Hold-Out
Method
Splits
the data into training
data & test data
Build a classifier using the train data
and test it using the test data
Stratificaton
Process of
dividing
members of population
into
homogeneous subgroups
before sampling
Strata should be
mutually exclusive
Every element in the population must be
assigned to only
one stratum
Strata should be
collectively exhaustive
No population element can be excluded
Simple random sampling/Systematic sampling
is applied
k-Fold
Cross Validation
Avoids
overlapping test sets
data is
equally split
into k subsets
Some subset for testing,
remainder for training
Leave-One-Out
Set number of folds to
number of training instances
N-1 training instances, 1 test instance
Bootstrap
Uses sampling with
replacement
to form the training set
Type of
resampling
Measurement
Confusion
Matrix
Precision
= TP / (TP+FP)
Recall = TP / (TP+TP)
Error
= (FP+FN) / (P+N)
Accuracy
= (TP+TN) / (P+N)
FP Rate
= FP/N
F1-Score
Harmonic mean
of precision and recall