Multiple Means
Type of Plot:
Side-by-side boxplot
qplot(x = explanatory,
y = response,
data = dataframe,
geom = "boxplot")
Parameters: μ1,μ2,…,μk
Point Estimates: ˉX1,ˉX2,…,ˉXk
Hypothesis Test
Hypotheses:
\( H_0: \mu_1 = \mu_2 = \ldots = \mu_k \)
\( H_a: \) At least one \( \mu_i \) is different
for \( i \in { 1, \ldots, k }\)
Test Statistic Random Variable (Assuming \(H_0\) is true):
\( F = \dfrac{\text{between-group variability}}{\text{within-group variability}} = \dfrac{\dfrac{1}{k - 1} \sum_j n_j (\bar{X}_j - \bar{X})^2}{\dfrac{1}{n_{total} - k} \sum_{i, j} (X_{ij} - \bar{X}_j)^2} \)
\( \quad \quad \sim \text{(Fisher's) } F (df_G = k - 1, df_E = n_{total} - k)\)
where \( \bar{X} = \) the mean response over all groups
Observed Test Statistic:
\( f_{obs} = \dfrac{MSG}{MSE} = \dfrac{ \dfrac{1}{k - 1} \sum_j n_j (\bar{x}_{j, obs} - \bar{x}_{obs})^2}{\dfrac{1}{n_{total} - k} \sum_{i, j} (x_{ij, obs} - \bar{x}_{j, obs})^2}\)
\( \mathbf{\textit{P}}\)-value:
\( \mathbb{P}(F \ge f_{obs}) \)
Conditions for Distributional Approximation (To \( F\)):
- Independent observations within and across groups
- Nearly normal populations in each group OR large sample sizes ( \( n \ge 30 \) )
- Variability across the groups is similar
Confidence Interval does not apply