Multiple Means

Type of Plot:
Side-by-side boxplot

qplot(x = explanatory,
y = response,
data = dataframe,
geom = "boxplot")

Parameters: μ1,μ2,,μk
Point Estimates: ˉX1,ˉX2,,ˉXk

Hypothesis Test

Hypotheses:
\( H_0: \mu_1 = \mu_2 = \ldots = \mu_k \)
\( H_a: \) At least one \( \mu_i \) is different
for \( i \in { 1, \ldots, k }\)

Test Statistic Random Variable (Assuming \(H_0\) is true):
\( F = \dfrac{\text{between-group variability}}{\text{within-group variability}} = \dfrac{\dfrac{1}{k - 1} \sum_j n_j (\bar{X}_j - \bar{X})^2}{\dfrac{1}{n_{total} - k} \sum_{i, j} (X_{ij} - \bar{X}_j)^2} \)


\( \quad \quad \sim \text{(Fisher's) } F (df_G = k - 1, df_E = n_{total} - k)\)


where \( \bar{X} = \) the mean response over all groups

Observed Test Statistic:
\( f_{obs} = \dfrac{MSG}{MSE} = \dfrac{ \dfrac{1}{k - 1} \sum_j n_j (\bar{x}_{j, obs} - \bar{x}_{obs})^2}{\dfrac{1}{n_{total} - k} \sum_{i, j} (x_{ij, obs} - \bar{x}_{j, obs})^2}\)

\( \mathbf{\textit{P}}\)-value:
\( \mathbb{P}(F \ge f_{obs}) \)

Conditions for Distributional Approximation (To \( F\)):

  1. Independent observations within and across groups
  2. Nearly normal populations in each group OR large sample sizes ( \( n \ge 30 \) )
  3. Variability across the groups is similar

Confidence Interval does not apply