LABELED/UNLABELED
DATA
Clusters/Folds
K2
$$K_1$$
$$K_n$$
Data extraction
Unlabeled data
Active learning
extraction
Labeled data
Test set
fold random sampling
OR
fold weighted sampling
Oracle(s)
Add new labeled data
Constraints
balance class distribution
same cluster size (aprox.)
balance labeled/unlabeled ratio
n samples
per each fold
account for dead zones
Train set
...
symmetric random
extraction
n residuals per
each fold
Constraints
Distance between
train & test points
stratification criteria
ML classification
training
Cross-cluster/fold
validation
min spatial autocorrelation