Two Means
(Independent Samples)
Parameter: μ1−μ2
Point Estimate: ˉX1−ˉX2
Hypothesis Test
Hypotheses:
\( H_0: \mu_1 - \mu_2 = 0 \)
1) \( H_a: \mu_1 - \mu_2 < 0 \)
2) \( H_a: \mu_1 - \mu_2 \ne 0 \)
3) \( H_a: \mu_1 - \mu_2 > 0 \)
Test Statistic Random Variable (Assuming \(H_0\) is true):
\( T =\dfrac{ (\bar{X}_1 - \bar{X}_2) - 0}{ \sqrt{\dfrac{S_1^2}{n_1} + \dfrac{S_2^2}{n_2}} } \sim t (df = min(n_1 - 1, n_2 - 1)) \)
Observed Test Statistic:
\( t_{obs} =\dfrac{ (\bar{x}_{1, obs} - \bar{x}_{2, obs}) - 0}{ \sqrt{\dfrac{s_{1, obs}^2}{n_1} + \dfrac{s_{2, obs}^2}{n_2}} }\)
\( \mathbf{\textit{P}}\)-value:
1) \( \mathbb{P}(\bar{X}_1 - \bar{X_2} \le \bar{x}_{1, obs} - \bar{x}_{2, obs}) = \mathbb{P}(T \le t_{obs}) \)
2) \( \mathbb{P}\left( \big| \bar{X}_1 - \bar{X_2} \big| \ge \big| \bar{x}_{1, obs} - \bar{x}_{2, obs} \big| \right) = \mathbb{P} \left( \big| T \big| \ge \big| t_{obs} \big| \right) \)
3) \( \mathbb{P}(\bar{X}_1 - \bar{X_2} \ge \bar{x}_{1, obs} - \bar{x}_{2, obs}) = \mathbb{P}(T \ge t_{obs}) \)
Conditions for Distributional Approximation (To \( T\)):
- Independent observations in both samples
- Nearly normal populations OR large sample sizes ( \( n \ge 30 \) )
- Independently selected samples
Confidence Interval
Formula for CI:
\( (\bar{x}_{1, obs} - \bar{x}_{2, obs}) \pm t^*_{df} \cdot \sqrt{\dfrac{s_{1, obs}^2}{n_1} + \dfrac{s_{2, obs}^2}{n_2}} \)
Conditions for Distributional Approximation (To \( T\)):
- Independent observations in both samples
- Nearly normal populations OR large sample sizes ( \( n \ge 30 \) )
- Independently selected samples
Type of Plot:
Side-by-side boxplot
qplot(x = explanatory,
y = response,
data = dataframe,
geom = "boxplot")