LABELED/UNLABELED
DATA

Clusters/Folds

K2

$$K_1$$

$$K_n$$

Data extraction

Unlabeled data

Active learning
extraction

Labeled data

Test set

fold random sampling
OR
fold weighted sampling

Oracle(s)

Add new labeled data

Constraints

balance class distribution

same cluster size (aprox.)

balance labeled/unlabeled ratio

n samples
per each fold

account for dead zones

Train set

...

symmetric random
extraction

n residuals per
each fold

Constraints

Distance between
train & test points

stratification criteria

ML classification
training

Cross-cluster/fold
validation

min spatial autocorrelation