Two Proportions

Type of Plot

Stacked bar chart

Mosaic plot

qplot(x = explanatory,
data = dataframe,
fill = response,
geom = "bar")

mosaicplot(table(dataframe$explanatory, dataframe$response),
xlab = "Explanatory",
ylab = "Response",
main = "Explanatory vs Response")

Parameter: P1P2
Point Estimate: ˆP1ˆP2

Hypothesis Test

Hypotheses:
\( H_0: P_1 - P_2 = 0 \)
1) \( H_a: P_1 - P_2 < 0 \)
2) \( H_a: P_1 - P_2 \ne 0 \)
3) \( H_a: P_1 - P_2 > 0 \)

Test Statistic Random Variable (Assuming \(H_0\) is true):
\( Z =\dfrac{ (\hat{P}_1 - \hat{P}_2) - 0}{\sqrt{\dfrac{\hat{P}(1 - \hat{P})}{n_1} + \dfrac{\hat{P}(1 - \hat{P})}{n_2} }} \sim N(0, 1) \)


where \( \hat{P} = \dfrac{\text{total number of successes} }{ \text{total number of cases}} \)

Observed Test Statistic:
\( z_{obs} = \dfrac{ \hat{p}_{1, obs} - \hat{p}_{2, obs} }{\sqrt{\dfrac{\hat{p}_{obs}(1 - \hat{p}_{obs})}{n_1} + \dfrac{\hat{p}_{obs}(1 - \hat{p}_{obs)}}{n_2} } } \)

\( \mathbf{\textit{P}}\)-value:
1) \( \mathbb{P}(\hat{P}_1 - \hat{P}_2 \le \hat{p}_{1, obs} - \hat{p}_{2, obs}) = \mathbb{P}(Z \le z_{obs}) \)
2) \(2 \cdot \mathbb{P}(\hat{P}_1 - \hat{P}_2 \ge \hat{p}_{1, obs} - \hat{p}_{2, obs}) = \mathbb{P}(\big| Z \big| \ge \big| z_{obs} \big|) \)
3) \( \mathbb{P}(\hat{P}_1 - \hat{P}_2 \ge \hat{p}_{1, obs} - \hat{p}_{2, obs}) = \mathbb{P}(Z \ge z_{obs}) \)

Conditions for Distributional Approximation (To \( Z\)):

  1. Independent observations in each sample
  2. Number of pooled successes and pooled failures is at least 10 for each group
  3. Independent selection of samples

Confidence Interval

Formula for CI:
\( (\hat{p}_{1, obs} - \hat{p}_{2, obs}) \pm z^* \sqrt{ \dfrac{\hat{p}_{1, obs}(1 - \hat{p}_{1, obs})}{n_1} + \dfrac{\hat{p}_{2, obs}(1 - \hat{p}_{2, obs})}{n_2} }\)

Conditions for Distributional Approximation (To \( Z\)):

  1. Independent observations in each sample
  2. Number of observed successes and observed failures is at least 10 for each group
  3. Independent selection of samples