Multiple Means

Type of Plot:
Side-by-side boxplot

qplot(x = explanatory,
y = response,
data = dataframe,
geom = "boxplot")

Parameters: $\mu_1, \mu_2, \ldots, \mu_k$
Point Estimates: $\bar{X}_1, \bar{X}_2, \ldots, \bar{X}_k$

Hypothesis Test

Hypotheses:
$ H_0: \mu_1 = \mu_2 = \ldots = \mu_k $
$ H_a: $ At least one $ \mu_i $ is different
for $ i \in { 1, \ldots, k }$

Test Statistic Random Variable (Assuming $H_0$ is true):
$ F = \dfrac{\text{between-group variability}}{\text{within-group variability}} = \dfrac{\dfrac{1}{k - 1} \sum_j n_j (\bar{X}_j - \bar{X})^2}{\dfrac{1}{n_{total} - k} \sum_{i, j} (X_{ij} - \bar{X}_j)^2} $

$ \quad \quad \sim \text{(Fisher's) } F (df_G = k - 1, df_E = n_{total} - k)$

where $ \bar{X} = $ the mean response over all groups

Observed Test Statistic:
$ f_{obs} = \dfrac{MSG}{MSE} = \dfrac{ \dfrac{1}{k - 1} \sum_j n_j (\bar{x}_{j, obs} - \bar{x}_{obs})^2}{\dfrac{1}{n_{total} - k} \sum_{i, j} (x_{ij, obs} - \bar{x}_{j, obs})^2}$

$ \mathbf{\textit{P}}$-value:
$ \mathbb{P}(F \ge f_{obs}) $

Conditions for Distributional Approximation (To $ F$):

Independent observations within and across groups
Nearly normal populations in each group OR large sample sizes ( $ n \ge 30 $ )
Variability across the groups is similar

Confidence Interval does not apply

Example Problem 2