Please enable JavaScript.
Coggle requires JavaScript to display documents.
Statistical Tests (Student's t-test (Paired t-test (data sets are…
Statistical Tests
Student's t-test
-
-
-
-
Assume normal distribution data, but safe to ignore
Assume same variance/standard deviation, safe to ignore when sample sizes are similar
Definitions
-
-
-
-
Covariance
mean of product of deviations, E[(X-E[X])(Y-E[Y])]
describes the variability when X, Y joint together
Chi-Square
-
-
-
= 20% of cells in table have expected values that < 5
no <10 is preferred for 2x2 contingency table
-
Cross Validation
k-fold. k=10, 1 for test, 9 for train
repeat k times with different test/train data piece,
use average accuracy
divide training data R to R1 and V
use V to tune parameters on a model trained by R1
use the params to train a model with R
Kolmogorov-Smirnov test
-
D statistic = max absolute difference between 2 Cumulative Distribution Functions (probability of a variable x being <= a specified value),
then look up in 'critical values of D for the KS test'
Non-parametric,
applicable to non-normal-distribution
ROC
-
as true positive rate grows, false positive rate also grows
-
ANOVA
-
assumptions
-
data for each group is normal distributed (not important when each group has large cases, >30)
data for each group have the same variance
(can ignore when #cases of largest group <= 1.5 * #cases of smallest group)
-
McNemar's Test
compare paired proportions.
e.g. which classifier is better, C1 or C2?